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Abstract 

In this paper we study the diameter of the random graph G(n,p), i.e., the the 
largest finite distance between two vertices, for a wide range of functions p = p(n). For 
p = X/n with A > 1 constant, we give a simple proof of an essentially best possible 
result, with an O p (l) additive correction term. Using similar techniques, we establish 
2-point concentration in the case that np — > oo. For p = (1 + e)jn with e — > 0, we 
obtain a corresponding result that applies all the way down to the scaling window of 
the phase transition, with an O p (l/e) additive correction term whose (appropriately 
scaled) limiting distribution we describe. Combined with earlier results, our new results 
complete the determination of the diameter of the random graph G(n,p) to an accuracy 
of the order of its standard deviation (or better), for all functions p = p(n). Throughout 
we use branching process methods, rather than the more common approach of separate 
analysis of the 2-core and the trees attached to it. 

1 Introduction and main results 

Throughout, we write diam(G) for the diameter of a graph G, meaning the largest graph 
distance d(x, y) between two vertices x and y in the same component of G: 

diam(G) = m&x{d(x,y) : x,y £ V(G), d(x,y) < oo}. 

In this paper we shall study the diameter of the random graph G(n,p) with vertex set 
[n] = {1, 2, . . . , n}, where each possible edge is present with probability p = p(n), inde- 
pendently of the others. For certain functions p = p(n), tight bounds on the diameter 
of G(n,p) are known; our main aim is to prove such bounds for all remaining func- 
tions. In particular, in the special case p = X/n with A > 1 constant we shall determine 
the diameter up to an additive error term that is bounded in probability, compared to 
earlier results with a o(logn) error term. A secondary aim is to present a particularly 
simple proof in this case. All our results apply just as well to G(n, m); in the range 
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of parameters we consider there is essentially no difference between the models. More 
precisely, although the results for one model do not obviously transfer to the other, the 
proofs for G(n, m) are essentially the same. 

We treat three ranges of p = X/n separately: A > 1 constant, A — > oo but with an 
upper bound on growth rate that extends well into the range covered by classic results, 
and finally A — > with an appropriate lower bound on growth rate. In each case, our 
analysis investigates the neighbourhoods of vertices, and has three components: 'early 
growth' — we study the distribution of the numbers of vertices at distance t when t is 
small; 'regular growth' in the middle — we show that the number of vertices at distance t 
is very likely to grow regularly once the neighbourhoods have become 'moderately large'; 
'meeting up' — we show that the distance between two vertices is almost determined 
by the times their respective neighbourhoods take to become 'large'. This is eventually 
translated into a result on the diameter. 

Our overall plan is made possible by the very accurate information we obtain on the 
first phase (early growth) . The main approach for this is to compare the neighbourhoods 
of a vertex of G(n,X/n) with the standard Poisson Galton-Watson branching process 
3C\ = (Xf)t>o'i this starts with a single particle in generation 0, and each particle in gen- 
eration t has a Poisson Po(A) number of children in the next generation, independently 
of the other particles and of the history. 

A particle in the process X\ survives if it has descendants in all later generations; the 
process survives if the initial particle survives. If A > 1, then the survival probability 
s = P(Vi : \X%\ > 0) is the unique positive solution to 

l-s = e~ Xs . (1) 

Since particles in generation 1 survive independently of each other, the number of such 
particles that survive has a Po(sA) distribution, the number that die has a Po((l — s)X) 
distribution, and these numbers are independent. It follows that conditioning on the 
process dying, we obtain again a Poisson Galton-Watson process 3L\+ = (Xf)t>o, with 
the 'dual' parameter 

A* = A(1- S ), (2) 
which may also be characterized as the solution A* < 1 to 

A*e- A * = Ae~ A . (3) 

This parameter is crucial to understanding the diameter of G(n, X/n). For this and 
other basic branching process results, see, for example, Athreya and Ney [3]. 

Our first aim is to give a proof of a tight estimate for the diameter of G(n, X/n) 
when A > 1 is constant as n — > oo that is simpler than our result for the general case, 
and also compares favourably with the existing proofs of much weaker bounds for the 
more general models discussed below. 

Theorem 1. Let X > 1 be fixed, and let A* < 1 satisfy A*e~ A * = Ae~\ Then 

diam(G(n, X/n)) = ^ + 2 ^f " + O p (l). (4) 
log A log(l/A*) 
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As usual, we say that an event holds with high probability, or whp, if its probability 
tends to 1 as n — > oo. Theorem Q] simply says that, for any K = K{n) — ► oo, the 
diameter is whp within K of the sum of the first two terms. 

The proof of Theorem [T] is fairly simple, and will be given in Section 
Turning to the case A = A(n) — > oo, we obtain the following result, proved in Section [3] 
using essentially the same method, although there are various additional complications. 

Theorem 2. Let A = A(n) satisfy A —* oo and A < n 1 / 1000 , and Zei A* < 1 satisfy 
A*e~ A * = Ae~ A . Then diam(G(n, A/n)) is two-point concentrated: there exists a function 
f(n, A) satisfying 

fin, A = + 2- — + 01 

log A log(l/A*) 

suc/i £/iai whp diam(G(n, A/n)) £ {/(n, A), /(n, A) + 1}. Furthermore, for any e > 
and any function A sitc/i i/iai neither logn/ log(l/A*) nor logn/log A is within e of an 
integer, we have 



diam(G(n, A/n)) 

u>/ip. 



log n 
log A 



+ 2 



log n 



log(l/A, 



Bruce Reed has announced a related result, in joint work with Nikolaos Fountoulakis 
independent of our work; the details are still to appear. We believe that the methods 
used are quite different. 

The main interest of Theorem [2] is when A tends to infinity fairly slowly; if A grows 
significantly faster than logn, then the situation is much simpler, and much more precise 
results are known. Indeed, when A/(logn) 3 — ► oo, Bollobas [6] showed concentration of 
the diameter on at most two values, and found the asymptotic probability of each value. 
In the light of this result we would lose nothing by assuming that A < (logn) 4 , say; the 
bound A < n 1//100 ° turns out to be enough for our arguments. 

The bulk of the paper is devoted to the case of expected degree tending to 1, where 
we prove the following result. 

Theorem 3. Let e = e{n) satisfy < e < 1/10 and e 3 n — > oo. Set A = A(n) = 1 + e, 
and let A* < 1 satisfy A*e~ A * = Ae _A . Then 

> / n\ log(e 3 n) „ log(e 3 n) ,_s 
dxam(G(n, A/n)) = -j^> + + O p (l/e). (5) 

Before turning to the proofs, let us briefly discuss these results and their relationship 
to earlier work. 

Theorem [1] is best possible in the following sense: it is not hard to see that the 
diameter cannot be concentrated on a set of values with bounded size as n — > oo. 
Indeed, given any (labelled) graph G with diameter d and at least two isolated vertices, 
let G 1 be constructed from G by taking a longest path P in G and two isolated vertices 
of G, and adding an edge joining each end of P to one of these vertices. Each graph G' 
constructed in this way contains a unique longest path, of length d + 2, and G may be 
recovered uniquely from G' by deleting the first and last edges of this path. Restricting 
our attention to graphs G with 0(n) isolated vertices, the relation (G,G') is thus 1 
to G(n 2 ). Since the probability of G' in the model G(n,p), p = A/n, is equal to the 
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probability of G multiplied by p 2 /(l — p) 2 = 0(l/n 2 ), it follows easily that for any d we 
have 

P(diam(G(n, A/n)) = d + 2) > G(l)P(diam(G(n, A/n)) = d) - o(l). 

(The o(l) term comes from the possibility that G(n,X/n) has fewer than O(n) isolated 
vertices.) It follows that diam(G(n, A/n)) cannot be concentrated on a finite set of 
values. In fact, our methods allow us to obtain the limiting distribution of the O p (l) 
correction term in (j3J), although this is rather complicated to describe; we return to this 
briefly in Section [5j 

A much weaker form of Theorem [TJ with a o(logn) correction term, is a special case 
of a result of Fernholz and Ramachandran |19j for random graphs with a given degree 
sequence, and also of a result of Bollobas, Janson and Riordan [HI Section 14.2] for 
inhomogeneous random graphs with a finite number of vertex types. We shall follow 
the ideas of [9] to some extent, although the present simpler context allows us to take 
things much further, obtaining a much more precise result. Earlier, Chung and Lu [13] 
also studied diam(G(n, A/n)), A > 1 constant, but their results were not strong enough 
to give the correct asymptotic form. Indeed, they conjectured that, under suitable 
conditions, the diameter is approximately log nj log A, as one might initially expect. 

For the subcritical case, which is much simpler, Luczak [26] proved very precise 
results: he showed, for example, that if e — > and e 3 n — > 00, then the subcritical 
random graph G = G(n, (1 — e)/n) satisfies 

log(2 £ 3 n) + O p (l) 
diam(G) = _ log(1 _ g) i (6) 

see his Theorem ll(iii), and note that the exponent 2 instead of 3 appearing there is a 
typographical error. (He also proved a simple formula for the limiting distribution of 
the Op(l) term - the probability that it exceeds a constant p tends to 1 — exp(— e~ p ) as 
n — > 00; the limiting distribution in the present supercritical case turns out to be much 
more complicated.) Luczak's results are effectively the last word on the subcritical case, 
which we shall not discuss further. 

Returning to constant A > 1, the lack of concentration on a finite number of points 
contrasts with the case of random <i-regular graphs studied by Bollobas and Fernandez 
de la Vega [8], who established concentration on a small set of values in this case. 
Sanwalani and Wormald [31] have recently shown 2-point concentration. (More precisely, 
they prove 1-point concentration for almost all n, and for the remaining n find the 
probabilities of the 2 likely values within o(l).) Note that the diameter in this case is 
simply log(nlogn)/log(<i— l) + O p (l) for d > 3; as we shall see, the behaviour of the two 
models for this question is very different. Usually, G(n,A/n) is much simpler to study 
than a random regular graph, but here there are additional complications corresponding 
to the 21ogn/log(l/A*) term in (|4]). 

Let us briefly mention a few related results for other random graphs models. Perhaps 
the earliest results in this area are those of Burtin [111 fl~2] and Bollobas [6 J . Turning 
to results determining the asymptotic diameter when the average degree is constant, 
one of the first is the result of Bollobas and Fernandez de la Vega [8] for d-regular ran- 
dom graphs mentioned above; another is that of Bollobas and Chung [7], finding the 
asymptotic diameter of a cycle plus a random matching, which is again logarithmic. 
Later it was shown by 'small subgraph conditioning' (see [32]) that for such graphs any 
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whp statements are essentially the same as for the uniform model of random 3-regular 
graphs. The same goes for a variety of other random regular graphs constructed by su- 
perposing random regular graphs of various types. For a rather different model, namely 
a precise version of the Barabasi-Albert 'growth with preferential attachment' model, 
Bollobas and Riordan [TO] obtained a (slightly) sublogarithmic diamter, contradicting 
the logarithmic diameter suggested by Barabasi, Albert and Jeong [2j Sj (on the ba- 
sis of computer experiments) and Newman, Strogatz and Watts [28J (on the basis of 
heuristics). 

More recently, related results, often concerning the 'typical' distance between ver- 
tices, rather than the diameter, have been proved by many people, for various models. 
A few examples are the results of Chung and Lu I15j . and van den Esker, van der 
Hofstad, Hooghiemstra, van Mieghem and Znamenski [211 \18\ I22j : for a discussion of 
related work see [21J, for example. 

The formula ([U) is easy to understand intuitively: typically, the size of the d- 
neighbourhood of a vertex (the set of vertices at distance d) grows by a factor of A 
at each step (i.e., as d is increased by one). Starting from two typical vertices, taking 
log(y / n) / log A steps from each, the neighbourhoods reach size about ^/n; at around this 
point the neighbourhoods are likely to overlap, so the typical distance between vertices 
is log n/ log A. The second term in (j4]) comes from exceptional vertices whose neighbour- 
hoods take some time to start expanding, or, equivalently, from the few very longest 
trees attached to (typical vertices of) the 2-core of G(n,X/n), the maximal subgraph 
with no vertices of degree or 1. It is well known that the trees hanging off the 2-core 
of G(n, A/n) have roughly the distribution of the branching process Xa^; hence, some of 
these trees will have height roughly logn/log(l/A^), and it turns out that the diameter 
arises by considering two trees of (almost) maximal height attached to vertices in the 
2-core at (almost) typical distance. 

Although we shall use the 2-core viewpoint later, its use has an intrinsic difficulty 
caused by the significant variation in the distances between vertices in the 2-core. One 
can view the variation in the distance between two random vertices of G = G(n, A/n) as 
coming from three sources: (i) variation in the distances to the 2-core, (ii) variation in 
the times the neighbourhoods in the 2-core take to start exanding, and (iii) variation in 
the time the neighbourhoods of the two vertices take to join up once they have reached 
a certain size. An advantage of our approach is that it seamlessly integrates (i) and 
(ii), by looking simply at neighbourhood growth in the whole graph G. Taking this 
viewpoint, the dual parameter A* arises as follows: let C X% be the set of particles 
of X\ that survive (have descendants in all future generations). Then Xq contains the 
initial particle with probability s, and is empty otherwise. Moreover, conditioning on 
a particle being in A t + is exactly the same as conditioning on at least one its children 
surviving, so the number of surviving children then has the distribution Z = Z\ of a 
Po(sA) random variable conditioned to be at least 1. Hence, (AT t + ) is again a Galton- 
Watson branching process, but now with offspring distribution Z, and X^ either empty 
or, with probability s, consisting of a single particle. Note that 



using (P) and ([2]). Hence, the probability that A t + consists of a single particle, given 
that the whole process survives, is exactly A* . Roughly speaking, this event corresponds 



P(Z = 1) 




sA(l - s) 



s 



A 



(7) 
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to the branching process staying 'thin' for t generations, i.e., the neighbourhood growth 
process taking time t to 'get going'. In the next section we shall prove a more precise 
version of this statement. 

Turning to Theorem O the form of the diameter here differs from what one might 
expect in the presence of the factors e 3 inside the logarithms in the numerators. Very 
loosely speaking, these factors turn out to be related to the fact that the branching 
process survives with probability 0(e), and then usually has size 0(1/ e) larger than its 
unconditional expected size, as well as to the fact that, roughly speaking, it takes order 
of 1/e generations for anything much to happen; we shall return to this at various points. 
An alternative way of thinking about these factors is that the 'interesting' structure of 
G(n,p) is captured by the kernel, the graph obtained from the 2-core by suppressing 
vertices of degree 2. The results of Luczak |25| or alternatively Pittel and Wormald [29] 
imply in particular that the number of vertices in the kernel is asymptotically 8e 3 n/6. 

Remark 1. In the first draft of this paper, we obtained a slightly weaker form of 
Theorem [3j giving the same conclusion but requiring an additional assumption, that e 3 n 
grows at least as fast as an explicit extremely slowly growing function of n (essentially 
log* n, i.e., the minimum k such that the A:th iterated logarithm of n is less than 1). 
This is a less restrictive assumption than what is common in related contexts, that e 3 n 
is at least some power of log n. Since then, Ding, Kim, Lubetzky and Peres |16[ [T7] 
have obtained a form of Theorem [3] with a larger error term (a multiplicative factor of 
1 + o(l)), valid whenever e 3 n — ► oo and e — > 0; under these assumptions, log A ~ e 
and log(l/A*) ~ e, so the diameter is (3 + o p (l)) log(e 3 n)/e. Their approach, based 
around the 2-core and kernel, is very different to ours. Seeing this paper stimulated us 
to remove the unnecessary restriction on e; it turned out that one simple observation 
(Lemma [38] below) was the main missing ingredient. 

In Theorem [31 the condition e < 1/10 is imposed simply for convenience; this may 
be weakened to e = 0(1) without problems. However, for e = O(l) bounded away from 
zero one can instead apply Theorem[TJ it is not hard to check that the constants implicit 
in the correction term vary smoothly with A, and so are bounded over any compact set 
of e > 0. For this reason, in proving Theorem [3] we may assume that e —* as n — » oo; 
we shall do this whenever it is convenient. 

On the other hand, the condition en 1 / 3 — > oo is almost certainly necessary for our 
method to give nontrivial information. If en 1 / 3 is bounded, then we are inside the 
'window' of the phase transition, so G(n, (1 + e)/n) is qualitatively similar in behaviour 
to G(n, 1/n), and the behaviour of the diameter is much more complicated than outside 
the window. For one thing, there is no longer a unique 'giant' component that is 
much larger than all other components. Also, the 2-core of each non-tree component 
contains only a bounded number of cycles; to study the distribution of the diameter 
accurately, one needs to study the distribution of the lengths of these cycles, which 
is very different from the situation with supercritical graphs. Nachmias and Peres |27j 
showed that inside the window the diameter of the largest component is O p (n 1//3 ), with a 
corresponding lower bound; more recently, Addario-Berry, Broutin and Goldschmidt pQ 
have established convergence in distribution of the rescaled diameter, and given a (rather 
complicated) description of the limit in terms of continuum random trees. 

Finally, as in Theorem [H the O p (l/e) correction term in Theorem [3] is in some sense 
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best possible. Our method in fact gives a description of the limiting distribution of this 
correction term; see Theorem 1401 in Section [5] 

In summary, the results of Luczak |26] (below the critical window), Addario-Berry, 
Broutin and Goldschmidt [lj (inside the window), Theorem [3] (above but average degree 
tending to 1), Theorem Q] (constant average degree), Theorem [2] (average degree tending 
to infinity slowly) and Bollobas [6] (average degree tending to infinity quickly) together 
establish tight bounds on the diameter of G(n,p) throughout the entire range of the 
parameters. 

2 The case p = A/n, A> 1 constant 

In this section we shall prove Theorem [TJ We start by recalling a basic fact about 
branching processes. 

From standard branching process results (see, for example, Athreya and Ney [3]), 
the martingale \Xt\/X l converges almost surely to a random variable Y = Y\ whose 
distribution (which depends on A) is continuous except for mass 1 — s at 0, with strictly 
positive density on R + . Furthermore, Y = coincides (except possibly on a set of 
measure zero) with the event that the branching process dies out. Since almost sure 
convergence implies convergence in probability, a trivial consequence of this is that, for 
A > 1 and < c\ < C2 all fixed, 

inf P(dA* < \X t \ < c 2 A*) > 0, (8) 

where the infimum is over all t > 1 such that the interval \c\X , C2A*] contains an integer. 
The following result indicates that unusually small populations in a given generation 
are typically due (at least, with a significant probability) to a branching process that 
stays essentially nonbranching (with only small 'side branches') until a point where it 
branches at a typical rate. 

Lemma 4. Let A > 1 be fixed. There are constants c, C > such that for every uj > 2 
and t > 1 we have 

cminIA*-* 1 , 1} < P(0 < \X t \ < uj) < CX^ 1 , (9) 
where t\ = [log uj/ log AJ . 

Proof. The lemma is essentially a statement about the asymptotics of Y near 0; this 
statement follows, for example, from a result of Harris [20J. However, translating back to 
a statement about Xt rather than Y would introduce an extra error term, corresponding 
to the probability that Xt/X t still differs from Y by more than a constant factor when 
X t first exceeds uj, so we shall give a direct proof. 

We start by proving the upper bound. Conditioned on X\ = (X t )t>o dying out, an 
event of probability 1 — s, this process has the distribution of the subcritical process 
£a* = (X^) t >o- Hence, 

F(\X t \ > 0, 3t' : X t , = 0) = (1 - s)P(|X t -| > 0) < (1 - s)E(\Xf\) = (1 - s)A*. 

Let p t = W(\X t \ > 0). Then 

pt = s + P(|X t | > 0, 3t' : X t , = 0) = s + 0(X{). (10) 
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Let us note for later that the implicit constant is independent of A; indeed, it may be 
taken to be 1. 

We may partition X\, the set of children of the initial particle, into two sets: the 
set S consisting of those that have descendants t — 1 generations later (i.e., in X t ), and 
the set X\ \ S of those that do not. Since the probability that a particle in X\ has one 
or more descendants in Xt is pt-i, the size of S has a Poisson distribution with mean 
Xpt-i- Let us condition on \X t \ > 0. Then the conditional distribution of \S\ is that of 
a Poisson distribution with mean Xpt-i conditioned on being at least 1, and we have 

m = 1 1 \*t\ > o) = A f: i ;; A A jr i 1 = + ocao) = a* + 

using (|7|) and (|10p for the last step. Note for later use that the implicit constant is 
independent of A provided A > 1 and A is bounded away from 1. 

Let rt = f(\Xt\ < uj | \Xt\ > 0). If \Xt\ < uj, then every particle in S has fewer than 
lo descendants in X t . Hence, 

rt < P(\S\ = 1 | \X t \ > 0)n_i + P(|5| > 1 | \X t \ > 0)r|_i 
= (A, + 0(XX t k ))r t _ 1 + (1 - A, + 0(AA*))r t 2 _ 1 

= r t _ 1 (A, + (l-A,)rt_ 1 ) + 0(AAir t _ 1 ). (11) 
Setting r£ = rt/A* and recalling that A is constant, we thus have 

r' t < r't-i + 1 . — n-ir't-! + 0(r t -i). (12) 

Using only the trivial inequality P(0 < \X t \ < < P(0 < \X t \ < u) for wi < u>, 
the upper bound in Q for to at least some constant ujq implies the same bound, with 
a different constant, for all a; > 2. Thus we may assume that uj is at least some large 
constant ujq, and hence that t\ is large. We may also assume t > t\. By ([8]) we have 
PdX^I > uj) > Co for some constant Co > 0. Hence r tl < 1 — Co is bounded away 
from 1. Choosing ujq large enough, so A^_ < A* 1 is small, the error term in (jlip can be 
assumed arbitrarily small relative to Tt-x- Using (fTTj) . and noting that for t > t\ we 
have A^ + (1 — X+)r t -\ < X+ + (1 — A*)(l — cq) < 1, it then follows that r t decreases 
exponentially as r increases from t\, i.e., that there is a c\ > (depending only on A, not 
on u) such that r tl +t < e _Cl *. Hence, 5^t>ti r t i s bounded (independently of u). Using 
(112p . it follows that there is a constant Co such that for t > ii we have r£ < Cofrjj + 1). 
In other words, 

r t < C X{-^r tl + C Q Xi < C (l + A^A^ 1 < 20^ = 0(A* _tl ). 

Since 

P(0 < |X t | < w) < F[\X t \ < co | |X t | > 0) = r t , 

this completes the proof of the upper bound. 

Turning to the lower bound, this is essentially trivial if t < t\\ in this case, P(0 < 
\X t \ < uj) is bounded away from by ©. We may thus assume that t > t±. We shall 
prove the lower bound by considering the following much more specific event E, the 
event that |^ t _ tl | = 1, that the unique particle v of X^_ t has between 1 and oj — 1 
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descendants in X t , and that no other particles of X t _ tl have descendants in X t . Clearly, 
if E holds then < \X t \ < u. 

Recalling that X + = (X^) is the set of particles whose descendants survive for ever, 
any such particle always has at least one child by definition, and, by (J7D, has exactly 
one child with probability A*. Thus 

F(\X+ ti \ = l)=s\ t - t \ (13) 

Given that |JQt. t | = 1, the number N v of descendants in Xt of the unique particle v 
in Xt~_ ti has the distribution of \X^ \ conditioned on the whole process surviving. From 
(JH), the (unconditional) probability that \X^ \ is between oj/2 and u — say, is bounded 
away from zero, and the conditional probability that 3C\ survives given this event is at 
least s. Thus 

F(N V < oj | \X+_ ti \ = l) =P(|X tl | < w | Vt : \X t \ > 0) > P(\X tl \ < u, V< : \X t \ > 0) > c 2 , 

for some positive constant c 2 . 

It remains to exclude descendants in Xt of other particles in Xt-t t ■ By definition, 
these partices do not survive. We may construct X\ as follows: first construct X + = 
(X^~) t >o- Then add in the particles that die: for each particle in each set X+, we must 
add an independent copy of Xx^ rooted at this particle. 

Given that | = 1, we have \X^\ = 1 for all r < t — t\. The probability 

that the copy of £\ t started at time r survives to time t is P(|X t ~L r | > 0) < A* _r . 
Since the different copies are independent, the probability that all die before time t 
is at least ]lr<t-t 1 ( 1 " X l~ r )- Now A* < 1, so E r <t- tl = °( A 1 1 ) = and 
rir<t-ti(^ — X *~ T ) — c 3' f° r some C3 > depending only on A. Hence, 

P(0 < \X t \ <uj)> s\ t - tl c 2 cz = ^(A^* 1 ), 

completing the proof of the lemma. □ 

The above lemma tells us virtually all we need to know about the branching process 
for the 'early growth' part of the proof of Theorem [TJ The next ingredient for this phase 
is a lemma connecting the growth of neighbourhoods in the graph to the branching 
process. The branching process model is most relevant if the growing neighbourhood of 
a vertex remains a tree. To be sure, almost all vertices do not lie on or near a short 
cycle. However, we cannot simply ignore the exceptional vertices, since a result about 
diameter makes a statement about all vertices, not just almost all. So we must be a 
little careful. 

We deal with the problem of non-tree neighbourhoods as follows. Given a vertex x 
of a graph G, let T t (x) be the set of vertices at graph distance t from x. Let G<t(x) 
be the subgraph of G induced by \J t , <t Tt'(x), regarded as a rooted graph with root x. 
We shall explore the neighbourhoods Tt(x) in the following essentially standard way. 
Fix once and for all an order on V(G). Having found Tt(x) (starting with t = 0), go 
through the vertices of Tt(x) one by one in the predetermined order. For each vertex 
v we expose all edges from v to vertices not yet reached in the exploration; this means 
we test each potential edge to an as yet unreached vertex for its presence; any edges 
detected are called 'uncovered.' If we uncover an edge vw, we add w to F t +i(x). Of 
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course this process correctly identifies the sets T t (x). However, it only uncovers certain 
edges: let G% t (x) denote the graph formed by the edges uncovered in our tests exploring 
up to Tt(x). Then G< t (x) is a tree: it is a spanning tree in the graph G<t(x). 

In the following results, X<t denotes the union of generations to t of the branching 
process £\, regarded as a rooted tree with root the initial particle, and = denotes 
isomorphism of rooted trees. 

Lemma 5. Let A > be fixed. For any rooted tree T with \T\ < re/2 we have 

F(G% t (x) = T) = e° m2/n) ¥(X< t T) 

and 

P(G< t (x) =T) = e° (|T|2/n) P(X< t * T) , 
where the implicit constants depend only on X. 

Proof. This is well known and easy to prove. The first statement follows from the 
natural step-by-step coupling between G° <t {x) and the branching process, where each 
step investigates the children (of a vertex or a particle, respectively). Suppose we have 
reached r — a vertices in total so far. Then the probabilities of finding a vertices in the 
next step are p\ = ( n_ ^ +a ) (A/n) a (l — \/n) n ~ r and p2 = e~ x X a ja\ in the two models. 
The ratio of these probabilities is 

Pl /p 2 = (n-r + a) (a) n~ a {l - X/n)- r e x {l - X/n) n = e O(ar/n+rX/n + xy n ) ^ 

where xtg\ = x(x — 1) • • • (x — a + 1). The sum of ar or r over all vertices in the tree is 
trivially at most \T\ 2 , so it follows that 

¥(G <f (x) = T) . oo, 

p ( g <f L T) ' = exp(0(|T| 2 /n + A|T| 2 /n + A 2 |T|/re)). (14) 

Since A is fixed, this proves the first statement. 

If G< t (x) ^ T, then G< t {x) is a tree, so G\ t {x) = G< t (x). Hence G< t (x) = T 
implies G° <t (x) = T. Given that G° <t = T, the probability that none of the untested 
edges between the \T\ vertices found is also present is again e°(' T ' 2//n ). So the second 
statement follows from the first. □ 

Using Lemmas |4] and El we can study the initial rate of growth of the neighbourhoods 
of the vertices of G(n, X/n). The first step is to show that these neighbourhoods cannot 
stay small but non-empty for too long. The basic picture is that after about 

h = Llogu;/logAJ (15) 

steps, we expect a typical vertex neighbourhood to expand to size approximately u. It 
is very unlikely that there are any vertices in the graph whose neighbourhoods expand 
to some reasonable size, say around log re, and then fail to expand to size uj in roughly 
the expected time from that point. However, some unusual vertices take up to 

i =llogn/log(l/A*)J (16) 
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steps before they begin the initial stages of expansion. 
We argue this more precisely as follows. 

Set oj = (logn) 6 , say, and define to and t\ as above. Let K = K(n) tend to infinity 
slowly (for instance, slower than log logn). 

For each vertex x, let B\{x) be the 'bad' event that 1 < |IV(:e)| < ui holds for 
< t' < t = to + t\ + K. The event B\{x) is a disjoint union of events of the form 
G a <t = T, where each tree T has size at most tco = o(y / n). Also, the corresponding 
union of the events X<t — T is the event that < \Xf\ < uj holds for all t' < t. Hence, 
by Lemma [5l 

P(Bi(a!)) ~ P(Vt' < t : < \X t ,\ < u) < P(0 < \X t \ < u) = O{Xt 0+K ), (17) 

where the last step is from Lemma |U 

Let B\ be the event that B\{x) holds for some x. Then 

P(Bi) < nP(Si(a?)) = 0{n\^ +K ) = 0(Af ) = o(l). (18) 

We now move on to the 'regular growth' part of the proof. That is, our next aim 
is to show that once the neighbourhoods of a vertex x reach size uj, with very high 
probability they then grow at a predictable rate until they reach size comparable with 
n. We shall use the following convenient form of the Chernoff bounds on the binomial 
distribution; see [23J, for example. 

Lemma 6. Let Y have a binomial distribution with parameters n and p. If < 5 < 1 
then 

P(|Y - np\ > Snp) < 2e- s2np/3 . 

□ 

Let < S < 1/1000 be an arbitrary (small) constant. Let us say that a vertex x has 
regular large neighbourhoods if one of the following holds: either |Tt(x)| < uj for all t, or, 
setting t~ = min{t : |r^(x)| > uj} and t + = t~ + log(n 3 / 4 /L<j)/ log A, we have 

(i- < 5)A*-*-+ 1 |r,-_ 1 (x)| < |r t (x)| < (i + ^a*-*^ 1 !^-!^)! 

for t~ < t < t + . In other words, the neighbourhoods grow by almost exactly a factor 
of A at each step from just before the first time they reach size uj until they reach size 
around n 3 / 4 . Note that since we start from the last 'small' neighbourhood X t -_i, the 
growth condition above certainly implies that 

IzA < JDgL < A(i + j) (19) 

1 + LUX 1 1 

holds for t~ <t<t + . 

Let B2{x) be the 'bad' event that a given vertex x of G(n, A/n) fails to have regular 
large neighbourhoods, and B2 = \J x B2(x) the global bad event that not all vertices 
have regular large neighbourhoods. 

Lemma 7. For each fixed vertex x of G(n, A/n) we have F(B2(x)) = o{n~ l ). Thus 
¥(B 2 ) = o(l). 
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Proof. This is well known (c.f. Janson, Luczak and Ruciriski |24} Section 5.2]), and 
essentially trivial from the Chernoff bounds (or Hoeffding's inequality); we nevertheless 
give the details. We explore the successive neighbourhoods of x in G(n,A/n) in the 
usual way, writing at for |r\(x)|. Conditional on oo, ai, ... ,0*, the distribution of o^+i 
is binomial with parameters n — m and p = 1 — (1 — \/n) at , where m = Y2t'<t at an< ^ 
p is the probability that one of the undiscovered vertices is adjacent to at least one 
member of Tt(x). Assuming that m = 0(n 3 / 4 ), say, we have E(at+i | ao,...,at) = 
Xat(l + 0{n~ 1 ^)). It then follows from Lemma [6] that, conditional on do, ...,at, if 
&t > a; / (100A) then we have 

> = e -mo S n)-*a t ) = o(n -100) 

(log nr/ 

using cj = (logn) 6 . Similarly, if at < w/(100A) then ¥(at+\ > w) < n~ 100 . 

Let t~ be the first t with a< > a>, if such a i exists. We have already shown above that 
the probability that < a t < lo holds for all t up to to + *i + K = O(logn) is o(n _1 ), 
so with probability 1 — o(n _1 ) either t~ is undefined, in which case there is nothing 
to prove, or t~ = O(logn), in which case we have so far uncovered 0(t~ui) = o(n 3 / 4 ) 
vertices. From the estimates above, with very high probability a t -_i > w/(100A), and, 
from this point on, the ratios aj + i/a t are within a factor 1 + 0((logn)~ 2 ) of A until a t 
first exceeds n 3//4 . It follows that x has regular large neighbourhoods with probability 
1 — o(n _1 ), as claimed. □ 

Note that we took lo as large as (logn) 6 just to simplify the estimates. If we are a 
little more careful, a large constant times logn will in fact do: significant deviations in 
the ratio at+x/at are only likely near the beginning, so we can bound these ratios above 
and below by sequences approaching 1 geometrically with high enough probability. 

We now move onto the third phase of the proof, where we consider the meeting 
up of neighbourhoods of different vertices and hence the distance between them. This 
still involves a careful look at the early development of neighbourhoods, since, from 
the second phase of the proof, we know that vertices with large close neighbourhoods 
will have large distant neighbourhoods. We treat the upper and lower bounds in the 
Theorem Q] separately. 




2.1 Upper bound 

As above, set to = (logn) 6 , say, and let K = K(n) tend to oo slowly. 

For x G V(G) let t w (x) = min{t : |r^(x)| > to}, if this minimum exists; otherwise 
t w (x) is undefined. Note that if the event B\ defined above does not hold, then whenever 
t LU (x) is defined, we have t LU (x) < to + ti + K. 

Set 

h = Llog(n/w 2 )/logAJ, 

and, for x,y G V(G), let E^y^j be the event that t^ix) = io + ^i — h 4>(y) = to + ti —j, 
and d(x,y) > (x) + i + (y) + j + 12 + SK + cq all hold, where Co > 2 is some constant. 
Our next aim is to bound the probability of the event E x y i j for given vertices x and y 
and given i,j > —K. 
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Recall that B2 is the event that not all vertices have regular large neighbourhoods. 
We claim that 

\ B 2 ) < A^A^e-^ 3 ™ = 0{n- 2 K^e- cXiK+l+3 ) (20) 

holds for some c > 0. First, arguing as in the proof of (|17p . using Lemma 2] and a 
version of Lemma [5] where we start with two vertices and compare with two copies of 
the branching process, we see that 

V[t u (x) =t + h-i, t u {y) = t + h - j, d{x,y) > tjx) + t w {y)) = ()()$-' X*' 3 ). 

Exploring the neighbourhoods of x and y in the obvious way, suppose we find that 
t w (x) =t + ti-i, tu(y) = t + ti- j, and d(x,y) > t u (x) +t u (y), i.e., our explorations 
have not yet met. Continuing the exploration of the two neighbourhoods a further 
I = [(t2 + 3K + i + j)/2\ steps in each case, we may assume that with respect to the 
neighbourhoods of x and y that have been revealed so far, the regular large neighbour- 
hood condition has not yet been violated. (If it has been, the event B2 must hold, and 
we are bounding the probability of an event contained in the complement of B2.) Then 



mln{\T tuj{x)+t {x)l \T Uy)+l (y)\} > 0.99a; A* = n(y/n\™+*+j). 

It may be that d(x,y) < t w (x) + £ + t w (y) + £, in which case we are done. Otherwise, 
the edges between T^M+efa) and ^t LJ (y)+e(y) have not yet been tested, so the chance 
that no such edge is present is 

(1 - A/n) |rt "M+^ )l|rt ^)+ ffe)l < e -(VnM«A 3K+i+J ') < e ~c\ iK +*+^ 

for some constant c > 0. Multiplying by the O(A* ~ 4 Ai "') bound obtained above gives 
the bound in (l20l). 



Let B be the event that 

. . log u> log n log n — 2 log u 

diam (G) > 2t + 2t x + t 2 + 3K + 10 > 2-2- + 2 - f + -2— —2_ + 3K 

log A log(l/A*) log A 

logn logre 

logA + log(l/A*) + • 

Our aim is to prove that with K — > 00 arbitrarily slowly, we have ¥(B) = o(l); in order 
to do so, it suffices to show that F(B \ (Bi U B 2 )) = o(l). 

Suppose that B holds but B\ U B2 does not, and let x and y be vertices at maximum 
distance. Since B\ does not hold, and d(x, y) is so large, exploring successive neighbour- 
hoods of x and y, these neighbourhoods both reach size at least u before they meet. 
Hence E XtV ^j holds for some i and j. Since B\ does not hold, E X y ij can only hold if 
i,j > —K. Hence, 

P(B\(BiUB 2 ))< ^ £ n^w,i\5 2 )<n 2 £ n- 2 o[\-^e- cX3K+l+3 

i,j>-K x,y£V{G) i,j>~K 

r + 2K+ l)\; r e- cX3K+r } = 0(e~ cXK ) - o( 1 ). 




completing the proof of the upper bound. 
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Remark 2. Note that one cannot prove the upper bound directly by the first moment 
method; a separate argument excluding very long thin neighbourhoods (bounding the 
probability of B±) is needed. Indeed, it is not too hard to show that the estimates above 
are essentially tight. Thus, if for some r > K there happens to be a vertex x with 
tcj(x) = to + t\ + r, say, an event of probability around A£, then x will be at distance 
roughly d = 2to + 2t\ + t% + K from many of the roughly (l/\±) r ~ K vertices y with 
tu)(y) = to + ti — r + K. Since there are G(logn) possible values of r, the expected 
number of pairs of vertices at distance d will tend to infinity if K — » oo slowly enough. 

2.2 Lower bound 

The idea of the lower bound is simple. Let S be the set of vertices x with t UJ {x) > 
to + t\ — K. Then, from the arguments in the previous section, the expected size of S is 
roughly (1/A*) , which tends to infinity. We would like to show that \S\ is large with 
high probability using the second moment method. Since two vertices in S are likely to 
be far apart, the result will follow. There are two problems. A minor one is that the 
events that different vertices lie in S are not that close to independent: vertices in S 
will usually be located in trees attached to the 2-core, and S roughly corresponds to the 
set of vertices at least a certain distance from the 2-core. Although most trees attached 
to the 2-core will contain no such vertices, it turns out that, on average, each tree 
contributing one or more such vertices contributes some constant number larger than 
1, so is not well approximated by a Poisson distribution. A more serious, related, 
problem is that to find vertices at large distance we need to find vertices in S whose 
short-range neighbourhoods do not overlap, i.e., vertices coming from different trees. 
We solve both these problems by looking for vertices x € S satisfying an additional 
condition, the strong wedge condition, that usually corresponds to x being the unique 
vertex in its tree at maximal distance from the 2-core. 

Note that as we are now looking for a lower bound on the diameter, we do not need 
to consider all promising pairs of vertices for our candidate vertices at large distance. 
We may thus impose additional conditions as convenient, and our result will still be 
sharp enough as long as these conditions are likely enough to be satisfied. One such 
condition is that the neighbourhoods are trees up to a suitable distance. 

Let x € V(G), and suppose that G<t(x) is a tree for some t > 0. The weak/strong 
wedge condition holds from x to X2 € Tt(x) if for every z ^ x in the graph G<t(x), the 
distance from z to the closest vertex y on the unique path from x to X2 is at most/strictly 
less than the distance from x to y. Note that either condition implies that the degree of 
x in G must be 1. In this section we shall always work with the strong wedge condition 
(or rather a related 'diamond' condition); the weak wedge condition will play a role in 
Section HI 

Let tx denote to — K, where K = K(n) — > oo arbitrarily slowly, in particular 
with K < log log n, and let W® be the event that G<t K (x) is a tree with the following 
properties: there is a unique vertex X2 at distance tjc from x, and the strong wedge 
condition holds from x to X2- Let W x be the event that W® holds and G<t K {x) contains 
fewer than u>/2 vertices, where to = (logn) 6 as before. 

Note for later that if W x (or W®) holds, then the tree G<t K (x) consists of an x-X2 
path P x of length tx with a (possibly empty) set of trees attached to each interior vertex, 
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the height of each tree being strictly less than the distance to the nearest end-vertex of 
P x . (Thus W® is a sort of 'diamond' condition.) It follows that the diameter of G<t K (x) 
is tx, and that x and X2 are the unique pair of vertices of G<t K (x) at this distance. 

Let W° and W be the branching process events corresponding to W® and W x , so 
W° is the branching process version of our diamond condition. The event that W holds 
is a disjoint union of events that X<t K is one of certain trees with at most ui/2 = o(n 1//2 ) 
vertices, so by Lemma[5]we have P(IV a: ) ~ P(W /r ). 

Once the branching process reaches size (logn) 4 , it is very unlikely ever to shrink 
down to size 1, and in fact the probability that W° holds but one of the first tpc = 
to — K generations has size at least (logn) 4 is o(n~ 100 ). (This follows from the proof of 
Lemma [3 but is much simpler.) Assuming this does not happen, the sum of sizes of the 
first to — K generations is at most io(logn) 4 = 0(log 5 n). It follows that P(W /0 \ W) = 
o{n~ wo ), so 

¥(W) = F(W°) + o(n- W0 ). (21) 

To calculate P(W°), consider the event W', that W° holds and the unique particle 
in generation to — K survives. Note that 

P(W") = s¥(W°). (22) 

If W' holds, then \X^~\ = 1 for t = to — K and hence for t = 0,1, . . . ,to — K , an 
event of probability sA* 0- ^. Conversely, constructing X\ as before by starting from 3£ + 
and adding in independent copies of 3t\* = (X~) started at each particle, W holds 
if and only if = 1 and, for < t < to — K, the copy of X\ t started at the 

unique particle of X^~ dies within min{max{£, l},to — K — t} generations: dying within 
to — K — t generations ensures that | A^ _^| = |^4^_^| = 1, and, for t > 0, dying within 
t generations ensures that the strong wedge condition holds. Let dt = ¥(\Xj~\ = 0) be 
the probability that the subcritical process dies within t generations. Then we have 

to-K-l 

F(W) = Stg-Kd! n d «nn { t,t -K-t} = s\^~ K d X d\ ■ ■ ■ df (to _^ )/2 j {to _ K )/2j > 
t=l 

(23) 

where the exponent 9 of the last factor is 1 or 2 depending on the parity of to — K. As 
we shall see, the later factors in the product are essentially irrelevent. Indeed, 

1 - dt = P(|A7| > 0) < E(|A7|) = A*, (24) 

so 1 — A* < dt < 1, and logd t = — 0(A*). Since Ylt ^* ^ s convergent, we thus have 
H t dt = 6(1), so P(W) = 0{\l°~ K ) and, using dSTJ and l|25]> . 

F{W) = F{W°) + o(n~ wo ) = s- 1 P(W / ) + o(n~ wo ) = &(\l°- K ). 

Since A* is of order 1/n and K — > oo, it follows that nF(W) — > oo, and hence that 
nF(W x ) oo. 

Recalling that t\ = [log uj/ log AJ , set t = tx = to — K , let W x be the event that 
W x holds, |T t+tl (x)| > uj, and |V(G<t+t 1 (a;))| < w 2 . If W x holds, then exploring the 
neighbourhoods of x to distance t we have by definition reached at most oj/2 vertices. 
Using US), it is easy to show that P(W+ | W x ) = 9(1), so P(W+) = Q{F(W X )). 
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Let N be the number of vertices x for which W x holds, so E(iV) = nP(W x ) = 
Q(nP(W x )) — > oo. We shall use the second moment method to show that TV is con- 
centrated about its mean. The argument is slightly more complicated than one might 
expect (or hope for); while one can give simpler arguments that are very plausible, we 
have so far failed to turn such an argument into a rigorous proof. In fact, the argument 
we do present deals with all issues of possible dependence with very little calculation. 

Suppose that x and y are distinct vertices and that W+ and W+ both hold. Then 
W x and W y also hold. Our immediate aim is to show that the subgraphs G<t{x) and 
G<t{y) must be edge disjoint, i.e., they can meet only if X2 = J/2> and then only at this 
one vertex. We shall write W x * W y for the event that W x and W y hold, and G<t(x) 
and G<t{y) are edge disjoint. In other words W x * W y = W x n W y n {d(x, y) > 2t}. We 
define W+ * W+ similarly, so W+ * W+ = W+ n W+ n {d(x, y)>2t + 2t x }. (One must 
be careful here: when W x holds, G<t{x) is not a certificate for this event in the sense of 
the van den Berg-Kesten box product [5], i.e., specifying that this particular subgraph 
is present as an induced subgraph does not guarantee that W x holds. To guarantee 
W x , one must also certify that various edges are absent, from G<t-i to vertices outside 
G<t'i such certificates for W x and W y can never be disjoint, so we cannot simply apply 
Reimer's Theorem [30] to bound F(W X * W y ).) 

Still assuming that x and y are distinct vertices such that W+ and W+ both hold, 
suppose first that y lies strictly inside G<t(x), i.e., that y G V(G<t(x)) \ {^2}- As noted 
earlier, since W x holds, G<t{x) has diameter t, and this diameter is realized uniquely 
by x and x%. Thus the vertex 1/2, which is at distance t from y, must lie outside G<t(x). 
But then the unique y-y2 path P y passes through X2 ■ Considering the vertex z where P y 
first meets P x , the strong wedge condition for x gives d(y,z) < d(x,z). But the strong 
wedge condition for y gives d(x, z) < d(y, z), a contradiction. 

We may thus assume that y lies outside V{G<t{x)) \ {^2}- Suppose now that 2/2 also 
lies outside this set, and that 2/2/^2- Since X2 is a cutvertex, it follows that all of P y is 
outside V(G<t(x)) \ {x2}- If G<t(x) and G<t{y) meet, then, since X2 is a cutvertex, X2 
must be a vertex of G<t(y). Furthermore, since G<t(y) consists of 1/2 phis a component 
of G \ {y 2 }, all of G< t (x) lies in G< t (y) \ {2/2}- In particular x G G< t (y) \ {2/2} and we 
obtain a contradiction as above. 

If X2 = y<z, then each of G<t(x) and G<t(y) is formed by X2 together with a tree 
component of G — xi- Since each of x and y is the unique vertex at maximal distance 
from X2 = y2 within its tree, and x 7^ y, these components are different, and so disjoint, 
so W x ★ W y holds. 

We may thus assume that, if W x *W y fails, then y lies outside V(G<t(x)) \ {a^} but 
2/2 is inside. It is easy to check that in this case G<t{x) U G<t(y) forms a component 
of G (and actually j/2 must lie on the path from x to x-z). Since n holds, this 
component has size at least u, but W x n Wy also holds, so it has size less than 2uj/2, a 
contradiction. 

We have just shown that if W x n W y holds, then so does W x * W y . It follows that 
either W+ * W+ holds, or d(x2, 2/2) < 2ti, implying d(x, y) < 2(t — -K" + ii). Thus, for 
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the second moment, 

EN 2 -EN = Yl F ( W x n W y) 

X y^x 
X y^x 
X y^x 

For the first term, we have P(W+ * W+) ~ P(W+)P(W+), since testing the event W+ 
uses up at most oj 2 vertices, which does not affect the probability of Wy significantly. 
(Alternatively, as before we may use Lemma [5] and a version of this lemma where we 
start at two vertices.) 

To handle the second term, we use the following inequality, which we shall prove in 
a moment: 

*W y D {d(x, y) < 2(to -K) + t 3 - K'}) = o( n - 2 ), (25) 

where K' = 3K log(l/A*)/ log A and t% = log n/ log A. Assuming this, using the fact 
that 2t\ < t% — K 1 for large n if K tends to infinity sufficiently slowly, we have EN 2 < 
EN+ (1 + o(l))(EA) 2 + n 2 o(n- 2 ). Since EN oo, it follows that EN 2 ~ (EN) 2 , so 
by Chebychev's inequality N is concentrated about its mean, and in particular, N > 2 
whp. 

Set d = 2(t -K)+t 3 - K', sod = log n/ log A + 2 log n/ log(l/A*) - 0{K). With 
K tending to infinity arbitrarily slowly, our aim in this subsection is to prove that 
diam(G) > d holds whp. 

Let M be the number of pairs of distinct vertices x, y for which W x *W y n{d(x, y) < d} 
holds. Using (125f) again, we have EM = o(l), so M = whp. Thus, whp, we have 
N > 2 and M = 0. Then there are distinct vertices x, y for which W x and hold. 
As shown above, it then follows that W x * Wy holds. Since M = 0, we have d(x, y) > d. 
Now whp all non-giant components have size O(logn), and under such conditions W% 
and imply x and y are both in the giant component. So diam(G) > d, as required. 

It remains only to prove (|25p . To do so, we explore the neighbourhoods of a given 
pair x, y of vertices as usual, to test whether W x * W y holds. If so, the possible edges 
between the remaining vertices, including xi and y2, have not yet been tested, so each 
is present with its original unconditional probability. Hence, given W x * W y , summing 
over all possible paths we see that the probability that d(x2, 2/2) < ^ is at most 

Y j n k -\X/n) k =^A fe /n = 0{X e /n) 
k<e k<e 

and 

r(W x *W y n {d(x, y) < 2{t -K)+t 3 - K')}) 



as required. 



< (l + o(l))F(W x )¥(Wy)0(X t3 - K /n) 

= o{\ t r K \ t r K x- K ') 

= 0{l/n 2 ){l/K) 2K \- K ' 

= 0(l/n 2 )(l/K) 2K ^ K = o(n- 2 ), 
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Combining the lower bound on the diameter we have just proved, and the upper 
bound proved in Subsection 12.11 we obtain Theorem [TJ 

3 Average degree tending to infinity 

In this section we shall prove Theorem [2j Throughout, when we consider G(n,X/n) 
we assume that A = A(n) — > oo with A < n 1 / 1000 . For convenience, we always assume 
that A is larger than some absolute constant Ao, chosen so that the various statements 
'provided A is large enough' in what follows hold for A > Ao- With A tending to infinity, 
some aspects of the proof become easier than the A constant case, whilst some become 
more difficult. 

We retain the same basic plan of attack as for the case of A constant. One of the 
main problems is that we cannot simply work with the time that the neighbourhoods 
of a vertex take to reach a certain size u, since the first neighbourhood larger than 
this may have size anywhere from uj to around Xuj; this difference is too big for our 
later arguments. Instead we will look at the size of the neighbourhoods at a specific 
time. We could consider sizes in certain ranges, but it turns out that we can simply 
consider individual sizes, bounding the probability that a certain neighbourhood has 
exactly a certain size r. Roughly speaking, as in the previous section, the probability 
that the neighbourhoods of a vertex take a generations longer than usual to reach (or 
exceed) some given size turns out to be around A", where A* < 1 is the dual branching 
process parameter, defined by A*e~ A * = Ae~ A . This event corresponds to the (later) 
neighbourhoods being a factor of A a smaller than usual. So we study for real parameters 
a the probability that the neighbourhoods are A a smaller than usual, writing this as a 
power of A*. 

3.1 Branching process preliminiaries 

We first give some lemmas describing the growth behaviour of the branching process. 

Lemma 8. Suppose A > 10 and < 5 < 1/2. Given that \X r \ = k > 1, with probability 
at least 1 — e~ cS Xk we have |X t |/(A*~ r £;) G [1 — 8, 1 + S] for all t > r, where c > is an 
absolute constant. 

Proof. We may assume without loss of generality that r = 0. For t > 0, let p% = 
\X t +i\/(\\X t \), and let E t be the event that \p t — 1| > 5/3 f ; it suffices to prove that 
¥(\J t E t ) < e ~ cS2xk . Let F t be the event that E t holds but no E s holds, s < t, so 
¥{\J t E t ) = £P(F t ). If no E s holds for s < t, then \X t \ > k\ t Y[ s<t {^ - 8/3 s ) > kX/lO. 
Turning to |X t+ i|, conditional on \X t \, by Lemma [6] the probability that pt lies outside 
[1 - (5/3*, 1 + 5/3*] is at most exp(-c 5 2 9-*A|X 4 |), for some c > 0. Hence F(F t ) < 
exp(— co<5 2 9 - *A t+1 fc/10), and the result follows by summing this rapidly decreasing se- 
quence. □ 

For < a < 1 define g(a) = g(X, a) by A^ (a) = P(Z < A 1 -"), where Z has a Poisson 
distribution with mean A. Thus A^ a ^ is the probability that Z is smaller than its mean 
by a factor A a or more. Note that g(a) is (weakly) increasing in a. Also, as A — > oo 
we have ¥(Z < A) ~ 1/2 and A* — > 0, so <?(0) = o(l). A simple calculation shows that 
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g(a) = 1 — o(l) for any fixed < a < 1. Also, using A* = Ae A + 0(A 2 e 2A ) we have 
P(Z < 1) = (1 + A)e~ A = A*e~ A *(l + 1/A) > A* for large enough A, and so 

< g(a) < 1 (26) 

for all < a < 1. 

Extend g to the real line by defining g(x) = \x\ + g(x — [x\); this gives an increasing 
function which, from (|26p . satisfies 

[x\ < g(x) <[ x \+l (27) 

for all x. It is straightforward to check that for any constant b > 3, say, if n is large 
enough then 

^g(a-logfe/logA) > yb/iyg{a) ,^g\ 



holds for all a. Indeed, if m < a — log b/ log A, a < m + 1 for some integer m, then ([28]) 
decodes to a statement of the form V(Z < bk) > A fe / 4 P(Z < k), where 1 < k < A/6; 
the inequality is easily verified by considering, for example, the ranges k > A/(106), 
\/A < k < A/(106), and 1 < k < On the other hand, if a — log 6/ log A < m < a 
then it decodes to A^ 1 P(Z < kb/X) > X b/4 P(Z < k), with k < A and bk > A; which is 
easily verified by considering the cases k > 0.9A and k < 0.9A, say. 
We next give an analogue of the upper bound in Lemma HI 

Lemma 9. Suppose that u > A and that t > is an integer. Then for A at least some 
absolute constant, setting t\ = log u>/ log A we have 

P(0 < \X t \ < uj/2) < 3A2 ( ^ tl} . 

Proof. Note first that if t < t\, then g{t — t\) < by (|27p . so the result holds trivially. 
We may thus assume that t > t\, so t > \t{]. We postpone the case t = \t{\ until the 
end of the proof. 

Case 1: t > \t{\ + 1. 

Similar to the proof of Lemma HI set r± = ¥(\Xt\ < to/2 \ \Xt\ > 0). Then it suffices 
to show that r% < 3\i^ tl \ We shall show in a moment that if t = \t{] + 1, then 

P(0 < \X t \ < uj/2) < l.lXl {t ~ tx) . (29) 

Suppose for the moment that this holds. Then by monotonicity of g and the fact that 
5(1) > 1; and since P(|-Xt| > 0) ~ 1, for such t we have rt < 1.2A* if A is at least some 
(absolute) constant. 

As noted in the proof of Lemma HI the implicit constant in all O(-) notation leading 
to (llip may be taken to be absolute when A > 1 is bounded away from 1, so this bound 
applies with A growing as a function of n. In particular, from (jlip . we have for arbitrary 
t > 1 

r t < r t _!(A* + r t _i + 0(AA*)) = r t _x(A, + r t _i + o^" 1 )) (30) 

We may iterate this, with A* sufficiently small in the following (as A can be assumed 
large). Beginning with t = \t{] + 1, when rt < 1.2A* from (|29p and hence rt+i < 2.5A 2 
from ([30]) . we see that that r t decreases extremely rapidly: r t < 3A^r t _i for t > \t{\ + 1. 
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Feeding the resulting bound rr tl i + k < (3A*) fc , k > 1, back into ([30]) . it follows that for 
t > \t{\ + 1 we have r t < r t -\\+(l + Et) where the first error term is a t most 1.3 

and later ones decrease extremely rapidly. Since Y\ t (l + £t) < 1.4 for A large enough, 
the result for Case 1 now follows from (|29p . 

It remains to prove (|29p . Assuming now that t = \ti~\ + 1, put a = t — 1\ — 1, so that 
< a < 1. We claim that 

P(0 < \X 2 \ < A 1 "") ~ \l +9{a) (31) 

and that 

P (|X 2 | > A 1 -", \X t \ < lj/2) = o(\l +9ia) ). (32) 

Since 1 + g(a) = g(t — ti), these imply (|2"9"|) . 

Note that P(|Xi| = 1) = Ae~ A ~ A*, and the probability that subsequently \X 2 \ < 
A 1_a is A*^ by definition of g. Thus, 

P(|Xi| = 1, \x 2 \ <x l - a ) ~xl +9{a) . 

On the other hand, conditioning on \X\\ = k > 2, the event that \X 2 \ < A 1_a is 
contained in the event F\, that at least half of the particles in X\ have at most ^A 1_cl 
children, and also in the event F 2 , that each particle in X\ has at most A 1_a children. 
Since a > 0, given = k where k > 2, the probability of F\ is at most P(Z > k/2) 
where Z has the binomial distribution Bi(fc,p) and p = P(Po(A) < A/2) < e~ ClA for 
some absolute constant c\ > 0. (Recall that A is sufficiently large throughout.) It 
follows that there is a c 2 > such that for k > 2, P(Z > k/2) < e ~ C2kX . So there 
exists an absolute constant ko such that for k > ko, P(Z > &/2) < e~ 3A = o(A^). Hence 
P(Fi I |Xi| > fc ) = o(xl +9(a) ) and thus 

P(|Xi| > k , \X 2 \ < X 1 -*) < P((|*i| > ko) A Fij = o{xl +9{a) ). 

Turning to 2 < k < k , we have P(|Xi| = k) = 0(X k ~ 1 X i! ). Suppose firstly that 
g{a) < c 2 as defined above. Then P((2 < |Xi| < k ) A Fx) < 0(A fc °- 1 A^)e- 2c2A = 
^i+2c 2 +o(i) _ ^i+9( a )^ g Q we ma y assume that g(a) > c 2 . Then 

P(|Xx| = fc)P(F 2 I \X!\ = k) = 0(A fc - 1 A„)A^ (o) = o(Al +9(a) ), 

since 

AA p(a) = A a(a)-o(l) = o(1) putting ^ 

pieces together, we have established (131 1> . 

The proof of (|32p is similar. Condition on \X 2 \ = k, where k > A 1_a . In the event 
that \Xt\ < uj/2, the average number of descendants in Xt of a particle in X 2 is less 
than Lo/(2X 1 ~ a ). Hence, this event is contained in the event F that at least one third of 
the particles in X 2 have at most 3o;/(4A 1_a ) descendants in Xt. However, we know that 
any one such particle expects about A*~ 2 = A* 1+Cl_1 = u/X 1 ~ a such descendants, and 
applying Lemma [8] to the subprocess originating from one such particle, the probability 
that it has at most 3w/(4A 1_a ) descendants in Xt is at most e _C3A for some C3 > 0. 
Arguing as for the proof of (f3Tj) . there exists fa such that ¥(F \ \X 2 \ > fa) = o(xl +9 ^) . 

We are left with showing (1321) in the case that A 1_a < \X 2 \ < fa, which requires 
\X 2 \ > 2 since a < 1. It is easy to see that P(|X 2 | < fa) = O^^Xl) = A*~ o(1) . 
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Conditional upon this, the event whose probability we are bounding requires F as defined 
above, which in this case requires at least one particle to have the small number of 
descendants in X t , which has probability at most k\e~ c:iX = o(A* 3//2 ). Hence 

H\X 2 \ < h)F(}X t \ < lo/2 | A 1 "" < \X 2 \ < h) = o(\l-° {1)+C3/2 ) = o(Xl) 

and we have (|32j) since 1 + g(a) < 2. 
Case 2: t = \t{\ . 

In this case we have P(0 < \X\\ < A 1_a ) < A*^ by definition of g. Using this in 
place of (1311), it suffices to show that P(|Xi| > A 1_a , \X t \ < u/2) = o(A^ (a) ); the proof 
is identical to that of (|32p . apart from the notation. □ 

We next turn to the analogue of the lower bound in Lemma HI as there, we bound 
the probability of a rather specific event involving extra conditions that will be needed 
in our lower bound on the diameter. 

As in Subsection 12.21 we sa Y that the branching process (Xt) satisfies the diamond 
condition to generation r if \X\\ = 1, there is a unique particle x r in X r , and the chain 
xqXi ■ ■ ■ x r of ancestors of x r is such that any 'side branches' starting from Xi die within 
min{i, r — i} further generations. For t = we interpret the diamond condition to hold 
vacuously. 

Lemma 10. Let t' > be an integer, and < a < 1 a real number. Let Fq be the event 
that \X t i\ = 1 and the diamond condition holds to generation t' , and let F\ be the event 
that < A 1_a . Then as A — » oo we have 

F(F nF 1 )~\f +a \ 

uniformly in t' and a. Furthermore, provided A is at least some absolute constant, then 
for any uj > A and t>t± = logo;/ log A there is a p with u/3 < p < 2a; such that 

F{F n F 1 n {\X t \ = p}) > Af ~ tl] /(3\uj), 

where Fq and Fi are defined as above with t' and a the integer and fractional parts of 
t — t\, respectively. 

Note that t\ is not rounded to an integer. Essentially, the lemma says that the 
probability that X\ survives but (after some time) is a factor X x smaller than it should 
be is around X^ x \ The second statement shows that there is some specific size in a 
suitable range such that the probability of hitting exactly this size is not much smaller. 

Proof. The event Fq is exactly the event W° referred to in (I2ip . but with tQ — K 
replaced by t' . Using (|22p to translate (|23p back in terms of W°, we have A* > P(F ) > 
X^dxdldldl ■■■ , where d t = F(\X^\ = 0) is at least 1 — A* from (jSJ). Since A* -> 0, it 
follows that F(Fq) ~ A*'. 

Conditioning on Fq says nothing about the descendants of the unique particle z in 
X t i , so if Z is Poisson with mean A then 

P(Fi | F Q ) = P(Z < A 1_a ) = A^ (a) , 
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where the last step is the definition of g(a). Since A*'A? (a) = \f +a \ this proves the 
first statement. 

Turning to the second statement, suppose that u> > A and t > t\ = logw/logA. Let 
t> = [t - tx\ and a = t - t\ - f. Let F[ be the event that \X t , +l \ = [\ l ~ a \ . Noting that 
x = |_A 1_a J is the most likely value x of Z with x < A 1 "", arguing as above we have 
P(-Fl I F ) > A^ (a) /A, and hence P(F n F{) > \l {t ' +a) /{2\), provided A is large enough. 

Noting that t - t' > t\ > 1, let F 2 be the event that the ratio |X t |/(A t-< ' -1 |X</ + i|) 
is between 9/10 and 11/10. Then by Lemma[8]we have P(i 7 2 | Fq n F[) — > 1, so 

P(F H F{ n F 2 ) > Af +a) /(3A). 

Noting that A 1_a A* _ *' _1 = A*^*'^ = A*-^"* 1 ) = A* 1 = u, and that [A 1_a J > A 1 - a /2, if 
Fq n F[ n F2 holds then so does the event E p = Fq n Fi D = p} for some p between 

9lu/20 and llw/10. So there is some p in this range for which f(E p ) > A* 4l /(3Ao;), 
as required. □ 

We also need an analogue of Lemma [S] without the assumption that A is fixed. 

Lemma 11. Let A = A(n) satisfy A < n 1 / 10 . Then the estimates 

F(G< t {x) = T) ~ P(G| t (x) = T) ~ P(X< t S T) 

hold uniformly over rooted trees T with \T\ < n 2//5 , where t is the height ofT. 

Proof. The proof is essentially identical to that of LemmaO Indeed, the estimate (]14|) is 
valid assuming only that \T\, A < re/2, say; under our present assumptions this estimate 
is exp(O(re~ 1 / 5 +n~ 1 / 10 +n~ 2 / 5 )) = l+o(l). As before, the result for G<t(x) follows, now 
noting that the expected number of untested edges present is 0(X\T\ 2 /n) = o(l). □ 

3.2 Neighbourhoods in the graph and how they meet 

Our immediate plan is to examine those vertices for which the breadth first search proce- 
dure takes an unusually long time to reach a 'large' number of vertices. For convenience 
we choose 'large' to mean around A , since we assume A < nViooo say; 

A 10 is much less 

than say n 1//4 . We do not attempt to optimise the power of re giving the upper bound 
on A. We first work towards a lemma that gives asymptotically the probability that two 
neighbourhoods of size at least A 9 /4 have a certain distance between them. This will 
be needed in particular later when we make variance calculations accurately enough to 
use the second moment method. 
As in Section [21 set 

t = Llogre/log(l/A*)J. 

For r > 1, let S r be the set of vertices x in the random graph with |r^ _(_io(x)| = r. 

Lemmas [10] and [9l in conjunction with Lemma [TT1 give some information on the 
expected size of S r , or, more precisely, on the size of unions of such sets over r in 
suitable ranges, though (as will be apparent in the argument below) the upper and 
lower bounds given by the lemmas can differ by a factor of A or more. 
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We first consider the branching process. For r > A, setting oj = 3r > 2r in Lemma [9] 
gives 

P(0 < | X t | < r ) < 3X f- logi3r)/ log A) . (33) 

Although we shall not use it, let us note that in the other direction, with a constant 
and A large enough, applying Lemma [TOl with w = \ar and then Lemma [8] gives 

P(0 < \X t \ < ar) > }_ X f- loe{ar/2)/loeX \ 

provided the argument of g is greater than 0. 

We will transfer the bounds above to the random graph using Lemma [TT| which shows 
that the corresponding random graph and branching process events have asymptotically 
the same probability, provided there are not too many vertices close to x, so that the 
trees used in applying Lemma [TT1 are not too large. First, define r<,(x) = Uj=o^i( x )- 

Let Si be the ('bad') set of vertices x such that |r< to+ io(x)| > n 1 / 4 . Since log(l/A*) ~ 
A, which is much larger than log A, we have A* 0+1 ° = n°WA 10 < n 1 / 8 if n is large enough. 
For fixed k > 1, the number of unlabelled rooted trees of height t with exactly k (non- 
root) leaves, all at distance t from the root, can be estimated by adding paths to leaves 
one at a time, giving the crude upper bound 0(1) (t + l)*-'" 1 . It is thus easily seen that 
for fixed k we have E |r< t (x)| fc < 0(l)(t + l) fe-1 A tfe . With t = t + 10 = O(logra) and 
k = 20, this gives E |r< to+ io(x)| 20 < (log n)°^n 2 - 5 = o(n 3 ). Thus Markov's inequality 
gives 

E \Bi\ < nP(|r< t0+10 (x)| 20 > n 5 ) = o^ 1 ). (34) 
A similar calculation shows that 

F(B° 1 ) = o(n- 2 ), (35) 

where is the branching process event corresponding to B\. 

Define ju r to be the expected number of vertices x in S r \ B\. Applying Lemma [TT1 
to each relevant tree, which has at most n 1//4 vertices by definition, and summing over 
x, we have Jl r < (1 + o(l))nP(| X io+ io| = r), so by (|33ll we have 

ju r < n(3 + O (l))A^ ( * 0+1 °- log(3r)/logA) 

< (3 + o(l))Ar log(3r)/logA (36) 

using (1271) . 

We can similarly see easily that the union of the sets S r \ B\ over all r < A 9 /4 is 
whp empty: setting u> = A 9 /2 in Lemma [9] gives 

P(0 < \X t0+w \ < A 9 /4) < 3 Af 0+1+log2/logA) 

< 3^(l og 2/logA) 

— n 

which is 3n _1 P(Po(A) < A/2) = o(l/n). Hence, using Lemma [PH again. ^r<A 9 /4^ = 
o(l). Since E = o(l), it follows that 

[J S r = whp. (37) 

l<r<A 9 /4 
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Thus, we are interested in S r for r > A 9 /4. 

An annoying feature of the present situation is that with some small probability, the 
size of can 'misbehave' for i > to+10. Although there are whp no vertices for which 
this happens to a significant extent, we need to treat these vertices separately. Define 
£(r) = max{0, [log (2(log 6 n)/r)/ log A]}, so £(r) > is minimal subject to rX^ > 
2 log 6 n. Let B 2 be the set of 'bad' vertices x with the property that |rt 0+ io(x)| > A 9 /4, 
and the "error ratio" 

|rt 0+ io+i(x)| _ 1 

has absolute value at least A -2 for some < i < £(r). We write Vq = V \ (Bi U B2) for 
the set of 'good' vertices. 

Lemma 12. (a) E\B 2 \ = o(l). 

(b) Conditional on two vertices x and y being in S ri n Vo and S r2 H V$ respectively, 
where A 9 /4 < < n 1 / 4 (i = 1 and 2), and additionally conditional on d(x,y) > 
2t + 20 + £{r x ) + £(r 2 ), we have 

P (d(x, y) > 2t + 20 + k) = exp ( - ^ (l + 0{\- 2 )) A^ + o{n^) 

^ i=l ' 

for all k > £{r\) +£{r2), where the constant implicit in the O(-) terms is uniform 
over all such n, r 2 and k. 

Proof. As in the proof of Lemma [71 we explore the successive neighbourhoods of a 
vertex. If at denotes |I\(a;)|, then conditional on the part of the graph explored up to 
this point, and assuming that it contains at most ra 2 / 3 vertices, at+x is distributed as 
binomial with parameters n — 0(n 2//3 ) and p = (Aaj/ra)(l + 0(Xat/n)). The mean is 
Xat(l + 0(n~ 1//4 )), so by Lemma[6] (a Chernoff bound) we have 

P(|a m - Xa t \ < (Aa t ) 3/4 ) = 1 - e' n ^\ (38) 

To prove (a) , in view of (|34p we only need to show that E | B 2 \ B\ \ = o(l) . First explore 
the successive neighbourhoods of any vertex x up to T to +io(x). If the cardinality of this 
set, at 0+ io, is less than A 9 /4 or greater than 21og 6 n, or if |r< to+ io(x)| > ra 1 / 4 , then x 
is certainly not in B 2 \ B\. Condition on the exploration so far assuming that none of 
these events hold, and that fli +io = r, so A 9 /4 < r < 2 log 6 n. Next, continue exploring 
a further £{r) steps. Provided the event in the left side of (|38p holds at each exploration 
step, the 'relative error' \at+i/Xa t — 1| is at most (Aat) -1 / 4 . In this case, 



ato+io+j ^ 



A l a 4o+ i 



<2(Aa t0 , 1 ^- 1 / 4 



-10; 



which is less than A -2 since af +io > A 9 /4. This implies x ^ B 2 . On the other hand, 
the probability that the event in the left side of f|38|) fails to hold for at least one of 

the relevant t is at most e v r ' , which is A* since A^ > e . The expected 

number of vertices x Bi with a to +io = r is 

0(A 8-log(3r)/logA ) by m Multiplying 
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these bounds together and summing over r > A 9 /4 gives o(l), that is, E \B 2 \ i?i | = o(l) 
as required. 

We turn to (b). Let A 9 /4 < rj < n 1 / 4 (z = 1 and 2). Take any vertices x and y, and 
explore the successive neighbourhoods of each up to to + 10 + -^(n) and to + 10 + t(r 2 ) 
respectively. At this point, it is revealed whether these neighbourhoods are all disjoint, 
which is equivalent to d(x,y) > 2to + 20 + l(r{) + £(r 2 ), and also (recalling that Vq = 
V\ (B1UB2)) whether x £ Vq and y £ Vq. Condition on the event that all three of these 
hold. It follows from x $ B 2 that \T to+1 Q +i (x)\ = n(l + 0(A~ 2 ))A i for < i < l(r x ), 
and similarly for y. 

We next explore the further neighbourhoods of x and y, each time choosing the 
smaller of the two for further exposure, until one of them has reached cardinality at 
least n 3 / 5 , or until they meet, whichever happens first. Note that for all r we have 
r \K r ) > 21og 6 n by the definition of i. Since x £" B2, it follows that |r t(J+10+ £( ri )(x)| > 
log 6 n (since it must be very close to its expected value), and similarly for y. So, 
by applying (|38|) and conditioning on non-failure at each step, we conclude that with 
probability at least 1 — o(n~ 3 ), at each step 

|r t0+10+fe (x)| = ri x k (i + o(X~ 2 )), 

and similarly for y. So we may assume this is the case each time. Prom this, when the 
sum of the two distances is 2to+20+fc — 1, the product of the sizes of the neighbourhoods 
is r±r 2 \ k ~ 1 (l + 0(A~ 2 )), and hence the probability of not joining in the next step is 

exp ( - nr 2 X k -\l + 0(A- 2 ))A/n) . 

The result follows, as long as the probability that they do not meet by the time the one 
of the neighbourhoods has reached size n 3//5 is bounded above by o(n~ 3 ). This must be 
the case since on the previous step, the neighbourhood that was extended must have had 
size at least n 3 / 5 /A(l + o(l)), so the product of sizes on the previous step must have been 
at least n 6 / 5 /A 2 (l + o(l)) which is at least n 11 / 10 as A < n 1 / 1000 . Thus the probability 
of not joining on the last step was at most exp(— An 1 / 10 (l + o(l)) = o(n~ 3 ). □ 

We now turn to the proof of Theorem [2j 

Proof of Theorem^ Recall that A = A(n) is some given function of n satisfying A — > 00 
and A < n 1 / 1000 . All limits are as n — > 00, or, equivalently, as A — > 00. As usual, all 
inequalities we claim are required to hold only if n (or A) is sufficiently large. 

Our first aim is to estimate the probability of the event that is conditioned on in 
Lemma 1121(b) . Let P r denote the probability that a given vertex is in Vo D S r , and 
P n jr2 the probability that a given pair of distinct vertices x and y satisfy x £ Vq n S ri , 
y G Vq D S r2 , and d(x, y) > 2t + 20 + l{n) + £(r 2 ). Note that x G Vq D S n iff the set of 
vertices of distance at most to + 10 + ^(^1) from x forms one of a specific set of graphs 
with less than n 1//4 + 0(A(logn) 6 ) = o(n 1//3 ) vertices, and P ri ,r2 counts configurations 
in which the explorations from x and y are disjoint. Since each exploration 'uses up' 
o(n 1 / 3 ) vertices, it is easy to see (for example using a version of Lemma [TT1 starting with 
two vertices) that 

For any r, let ju r denote nP r , the expected size of VoHSV; recall that ju r = E \S r \Bi\, 
so fi r = Jl r + o(l) by Lemma fT2T a). Also, for integer k > 1 define ju(ri, r 2 , k) to be the 
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expected number of ordered pairs (x,y) of vertices with x E Vq D S n , y E Vq f] S T2 , 
and d(x,y) > 2 (to + 10) + Since Vq = V whp, the number of such pairs essentially 
determines the diameter. From the above observations and Lemma IT27b). 

/2(rx,r 2 ,k)~-p ri -j2 r2 ^exp(-rir 2 (l + 0(A- 2 ))^AVn) + o(rT 3 )j (40) 

provided k > £(r{) + £(r 2 ) and r% and r 2 satisfy the constraints of Lemma [T2l Note that 
we shall consider values of k that are at least log nj log A — 30, which is larger than 2£(r) 
for any r > 0. 

Define fi r = nP(\X to+w \ = r), which we shall analyse using Lemmas [TOl and l9l We 
claim that 

/2 r < % < /i r (l + o(l)); (41) 

indeed, the first inequality holds by definition. If r > n 1 / 4 then /I r = 0; otherwise the 
second inequality follows from Lemma [TT\ summing over the possible neighbourhoods 
of x. In the other direction, although we shall not these bounds, note that for r < n 1 / 4 
we have 

% > /i r (l + o(l)) +o(l), 

since fl r > Jl r + o(l) by Lemma [T2Ta). and Jl r > + °(1)) + from Lemma fTTj 
together with ([35|) . 

We next show that vertices in sets Vo H S r with r > A 131 will not determine the 
diameter of the graph, for the reason that they join too quickly to all vertices under 
consideration: we claim that whp all such vertices have distance at most log n/ log A + 
2io — 0.05 from all other vertices. We will see later that the diameter is greater than 
this whp. To establish this claim, without loss of generality consider only r\ > A 131 
and r 2 > A 9 /4. Note that the conditions on the r% in Lemma [T2T b) are so restrictive 
because it aims for a fairly accurate asymptotic estimate. In this case we only need to 
observe that if x S Vq D S r for r = t\ or r 2 , by definition of -B 2 , |r t()+ io+i(2;)| ~ AV until 
the neighbourhoods reach size at least (logn) 6 (which they may do at i = 0), and for 
larger neighbourhoods up to size n 2 / 3 , (|38p provides the same relation with probability 
at least 1 — o(n -4 ). Summing over all 0(n 2 ) pairs of vertices x and y gives 

fi(rx,r 2 ,k) </2 ri /2 r2 exp^-(l + o(l))rir 2 A fc /n) +o(n~ 1 ), (42) 

which is similar to (|40p but does not have the same restrictions on r% and r 2 . For k = 
Llogn/logA - 20.05J we have ri r 2 X k /n > (ri/A 13 - 1 )(4r 2 /A 9 )A L05 . Now gH) and (1361) . 
together with A^ = e~ A+ °^ A \ give 

fir = O(l) exp ((1 + o(l))A(log(3r)/ log A - 8)). 

Summing the resulting bound on ju(ri, r 2 , A;) over all n > A 13 - 1 and r 2 > X 9 /4 gives 
o(l), as required to establish the claim. (The key observation is that when n and r 2 
take their minimum values, we have jl ri ji r2 = exp(0(A)), while the exponential factor 
in ([42]) is exp(— A 1 ' 05 ). When r\ and r 2 increase, so does fi ri Ji r2 , but the exponential 
factor decreases more than fast enough to compensate.) 

Recalling (|37p . let R be the set of indices r, A 9 /4 < r < A 13 ' 1 , for which ju r > A -14 . 
Then, by the union bound, the expected number of vertices in all sets VqD S r with r in 
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this range but not in R is o(l), i.e., there are whp no such vertices. Since Vq = V whp, 
using the observation above about sets S r with r > A 131 and (|37p . we have shown that 

diam(G) = max max{d(x,y) : x G Vq D S ri , y S Vq H 5V 2 } whp. (43) 

(ri,r 2 )eR 2 

It only remains to examine r\ and r2 in R. Let feo( r i) r 2) denote the maximum k such 
that /2(r*i, r2, fc) > A -27 . (This number fco depends on n.) Then //(r*i, r2, &o(ri, r 2 ) + l) < 
A -27 . Let fcmax be the maximum value of ko over all pairs (r%,r2) in i? 2 . From (|42|) 
and the definition of R, it is easy to check that k max = log nj log A + 0(1). Setting 
f(n,X) = 2(t + W) + k 

max; to prove the first part of Theorem [2] we shall show that 
the diameter is whp either /(n,A) or f(n,X) + 1. Since |i? 2 | = 0(A 26,2 ), by the union 
bound, the expected number of pairs of vertices x and y counted in (f43j) at distance 
greater than f(n, A) + 1 is o(l). Thus diam(G) < f(n, A) + 1 holds whp. 

To see that the diameter is whp at least f(n, A) = 2(to + 10) + /c max we shall look for 
vertices at this distance in suitable sets S n . Choose (n, r%) in R 2 with &o(ri, rz) = k max . 
Note that /2(n, 7*2 , fcmax) > A~ 27 . That is, from (f40|) . 

^max 

ju ri /2 r2 exp f - nr 2 (l + 0(A~ 2 )) ^ X i /n\ + o(/I ri 7i r2 /n 3 ) > (1 + o(l))A~ 27 . 

i=l 

By definition /2 ri < n, and n _1 = o(A~ 27 ), so 

^max 

// ri /i r2 exp f - nr 2 (l + ©(A" 2 )) ^ A'/ril > A~ 27 (l + o(l)). 

i=l 

Since r, < A 131 by definition of R, from f)36|) and (|41j) we have ju n < A^ 6 < e 6A for i = 1 
and 2. Hence 

exp ( - nr 2 (l + 0(A- 2 )) £ V/n) > A^V 12 * > e - 13A , 

8=1 

if n is large enough. Taking logs and stopping the sum one step earlier, this gives 

-nr 2 (l + 0(A- 2 )) AVn>-13. (44) 

8=1 

Hence, by Lemma [T27b). vertices x and y whose (to + 10)-neighbourhoods have sizes 
n and r2 respectively have a significant (at least e -13 + o(l)) probability of being at 
distance at least 2io + 20 + fc max - Although by design we expect a large number of pairs 
of such vertices x and y, it is still possible that the expected number of possibilities 
for either x or y goes to 0! Our strategy is to consider vertices with |r to+ io(-)| around 
2000rj, say, and show that this gives us many vertices x and y to work with. We also 
impose certain extra conditions on their neighbourhoods needed later. 

For i = 1,2, since is in R, we have "j2 n > A -14 . Now (j41 j) shows that P(|Xt +io| = 
r) = nrjn > (1 + o(l))A~ 14 /n. By ([33]) it follows that 

^(to+io-iogOnViogA) > (1/3 + o(1))A -l4 /n _ (45) 
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Let Ui = lOOOri < A 14 . By {25]) and 05) we have 

A 3(to+10-kWlogA) > (1/3 + o(1))A 250/3 A -14 /n > A 40 /n; (4g) 

if A is large enough. For i = 1, 2, applying Lemma [TU] with w = and t = to + 10, there 
is some /3j with u>i/3 < pi < 2u>i such that the event Fq n Fi D {|AT to+ io| = Pi} described 
in Lemma [TU] has probability 7Tj satisfying 

7Tj > A2 ( * 0+1 °~ log " l/logA) /(3A^) > A 39 /(n^) > A 25 /n, (47) 

using ([46]) . Let E Pi (x) denote the event that x ^ B\ and the neighbourhoods of x up 
to distance to + 10 form a tree that, when viewed as a branching process, satisfies the 
conditions Fq n Fl n {|X to+ io| = By (|35|) and Lemma [T_T] we have P(F Pi (x)) ~ 
7Tj + o{n~ 2 ). Since 7r, is much larger than n~ 2 , it follows that F(E Pi (x)) ~ 7Tj > A 25 /n. 

Let F Pi (x) be the event that F Pi (x) holds and x £ Vo, so the only additional condition 
is that x i B 2 . Let Pi = F(E Pi (x)). Since F(x £ B 2 ) = o(l/n), we have 

P J ~P(F Pi (x))~7r i >A 25 /n. (48) 

Note also for later that, writing ti and for the integer and fractional parts of 
to + 10 — log LOi/ log A, and writing Fo(x) for the event that the neighbourhoods of x 
satisfy the diamond condition to distance U (corresponding to Fq in Lemma [T0|) , then 
starting from the first statement of Lemma [TU] and arguing as above we have 

P(F (x) n fllVnl < A 1 "^}) ~ Aff^+io-log^/iogA)^ 
Using the first inequality in (I47p it follows that 

P l > A 15 P(F (x) n {|r t . +1 | < A 1 "^}). (49) 

In other words, once we have explored the neighbourhoods to the 'branching vertex' xo, 
and found few neighbours in the next step, it is not that unlikely that E Pi {x) holds. 

Given distinct vertices x and y, as in (|5Uj) the probability that E pi (x) and E p2 (y) 
hold and d(x,y) > 2(t + 10) + £(pi) + £(p 2 ) is (1 + o(l))PiP 2 - Furthermore, conditional 
on this holding, then by a variant of Lemma [T2] that simply includes extra conditions 
on the neighbourhoods of a vertex up to distance to + 10, the conditional probability P 
that d(x, y) > 2(to + 10) + /c max satisfies 



,xpi -^p(l + o(l)) A*J+o(n- 3 ). (50) 



Since p { < 2ui = 2000ri, using (JUD shows that P > exp(-0(l)), so P = 9(1). 

Let us call an ordered pair (x,y) a regular far pair if E pi (x) and E p2 (y) hold, and 
d(x,y) > 2(t + 10) + /c max , an d let N denote the number of regular far pairs; our aim 
is to show that N > 1 holds whp. From (|48|) we have nP 1 ,nP 2 > (1 + o(l))A 25 —>■ oo 3 so 

EN ~ n 2 P x P 2 P -> oo. 

Unfortunately, we cannot use the trick from Subsection 12.21 to complete the proof: this 
trick, which allowed us to avoid considering the second moment of the number of pairs 
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of vertices at large distance, needed P ~ 1. This will in fact hold for almost all values of 
the parameters in the present setting, but not all. Moreover, we now have less tolerance 
in the final estimate of the diameter, and consequently less flexibility. Instead we apply 
the second moment method directly to N. In the arguments that follow we shall avoid 
using the fact that P = 0(1), using only 

P > n- 1 / 20 , (51) 

say; this will be useful later. 

Let M = K(N(N — 1)) denote the expected number of pairs ((x, y), (z, w)) of regular 
far pairs; our aim is to show that EM ~ (EN) 2 . Note that the number of distinct 
vertices in {x, y, z, w} may be 2, 3 or 4. The contribution to M from sets with 2 distinct 
vertices is trivially at most 2 EiV = o((EJV) 2 ) (the factor 2 arises only if p\ = po). Let 
us leave aside the case of 3 vertices, noting only that we expect the contribution from 
pairs with x = z, say, to be asymptotically 

nP 1 (nP 2 ) 2 P 2 ~ (EiV) 2 /(nPi) = o((EiV) 2 ), 

since nP\ — > do. The argument for the case of 4 distinct vertices that we shall now give 
adapts easily to show this. 

Let Mo be the contribution to M arising from sets of 4 distinct vertices {x,y,z, w} 
whose neighbourhoods up to distance to + 10 + i(p%) are all disjoint, where i = 1 or 2 
as appropriate. To estimate Mo, explore from four distinct vertices, and test whether 
the relevant events E Pi {-) hold with the neighbourhoods disjoint. As in (|39p . this has 
probability (1 + o(l)) P 2 P% . O ur aim is to bound from above the conditional probability 
that d(x, y),d(z, w) > 2(t + W) + k max , showing that it is at most (1 + o(l)).P 2 . Since 
none of x, y, z, w is in E>2, the neighbourhoods have already reached size at least log 6 n. 
From this point onwards, as before, we may assume they grow at almost exactly the 
expected rate. Note that we may ignore events of conditional probability o(n -1 / 10 ) = 
o(P 2 ), since we have already conditioned on an event of probability (1 + o(l))P 2 P|. 

Since we stop the explorations when the neighbourhoods are no larger than n 3//5 , 
say, we may assume that any intersections between neighbourhoods are small, involving 
at most a fraction n -1 / 3 of the vertices in a neighbourhood. Such small intersections 
do not materially affect the calculations in Lemma fi"2Tb). so the conditional probability 
that d(x, y),d(z, w) > 2(t + 10) + /c max is indeed (1 + o(l))P 2 . 

It remains to deal with cases where some of the neighbourhoods meet within distance 
to + 10 +£(pi) from the respective vertices. As above just after (j4*6|) . let U be the relevant 
parameter t' in Lemma \10\ where i = 1 or 2 depending on which vertex we consider. 
Note that to have the property E pi (v), all our starting vertices v must have the property 
that r^(u) contains a unique vertex vo. Also, within the tree up to this point, v must 
be the unique vertex at maximal distance from vq, so our 'diamond' condition holds. As 
in Subsection 12.21 it follows that in a quadruple contributing to M, the neighbourhoods 
cannot meet before the corresponding vertices Vq, so the minimum possible distance 
between starting vertices is U + tj. 

Returning to the random graph without conditioning, let us explore the neighbour- 
hoods of our 4 distinct vertices x, y, z, w out to distance tj — 1 in each case, assuming 
these explorations are disjoint, and that there are no edges between the final sets (such 
an edge would give distance U + tj — l). Furthermore, let us test for each of these vertices 
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v how many neighbours T t ._i(v) has in the remaining set U of 'unused' vertices, but not 
which neighbours it has. If our quadruple is to contribute, in each case there must be 
exactly one neighbour, vq. Now conditional on the information so far, the probability 
that xo = zo, say, is exactly 1/\U\ ~ 1/n. If this happens, then going forwards, the 
remaining calculations are exactly as if we had x = z in the beginning. Summing the 
corresponding contributions to M, the total from cases with x 7^ z but xo = zq has 
an extra factor of n from the choice of z (compared to the case x = z), but also an 
extra factor that is asymptotic to 1/n as noted above. (There is also the extra factor 
of at most 1 from the condition on the neighbourhoods of z up to distance t, — 1; we 
can ignore this). In total, the contribution here is at most that with x = z, which is 
o((E N) 2 ) as noted above. (The argument here is not circular; when considering here 
the three-vertex collision of this form reduces to the two-vertex case.) 

So we may assume that Xo, yo, zq and wq are distinct. Repeating the trick above, 
let us first test how many neighbours each has among the unused vertices (not testing 
edges such as xqZq for now). For our quadruple to contribute, by definition of E Pi {-) the 
numbers must be at most A 1_ct * with i = 1,2 as appropriate. Since there are n — 0(n 1//4 ) 
unused vertices, the probability of this happening is very close to P(Po(A) < A 1_ai ). 
Using (09]), it follows that the probability that all our tests so far succeed is at most 
^6 Ip2p2 Hence, going forward, we may neglect an event of probability smaller than 
n -1 / 4 = o(A~ 61 ), say. So far we revealed the numbers of neighbours, which were all 
at most A, but not which vertices they were. But the probability of a collision is 
0(A 2 /n) = o^- 1 / 4 ), which is negligible. Also, the probability of an edge between xq 
and zq, say, is 0(A/n) = o(n~ 1//4 ). Recall that any vertex in a pair counted in N, 
or a quadruple in M or Mq, has the property E Pi for some i and is hence in Vo = 
V(G) \ {B\ U B2). Exploring further up to distance 10 + t(pi) steps from each vertex 
vo, where i = 1 or 2 as appropriate, assuming typical growth as we may, the probability 
that two neighbourhoods meet, starting as they do with at most A neighbours of vq, is 

So we may assume this does not happen, and hence 
M — Mq is negligible compared with M. 

In summary, it follows that M = ¥,{N{N - 1)) ~ n 4 P 1 2 P 2 2 P 2 ~ (EN) 2 -» 00, so 
the second moment method shows that N > 1 whp. But then the diameter is at least 
2(to + 10) + k max , completing the proof of the first half of Theorem [2l 

The second part of the theorem states that for 'most' values of n the diameter is 
almost determined, and gives a formula. The general exact formula is a bit complicated 
if we want to include all values of the parameters, even restricting to those for which the 
diameter is almost determined. In formulating Theorem [2] we omitted some additional 
problematic values of n, giving a much simpler formula. One way to explain the source 
of the problematic cases is that, although the difference in the bounds ([33j) . and the 
equation following it are usually negligible, when the typical diameter is close to jumping 
to the next integer, the fact that the bounds do not exactly match becomes important. 

Writing {x} for x — [x\, in proving the second part of the theorem we may assume 
that 

5e < {log n/ log A} < 1 - 5e, , , 

5e < {logn/log(l/A*)} < 1 - 5e, 1 ' 

where e is some positive constant, which we may take to be smaller than 1/10. 

Let us first consider some values of r that, as it will turn out, in many cases (i.e., 
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for many values of n) typically determine the diameter of the random graph. 

Define q n to be the infimum of q such that n _1 > A* 0+ . From the definition of g, 
with A fixed and q varying, A*^ jumps by a factor of at most A at each discontinuity. 
(With X ~ Po(A), the ratios P(X < k + i)/F(X < k) are between 1 and A, while the 
ratio \- l ¥{X < 1)/P(X < A) is asymptotically 1/P(X < A) ~ 2.) Thus for large n 

for some £; = £(n) between 1 and A. We call n "normal" if q n < e and g{q n ) > 4A~ £ . 

Taking logs in (f53|) . since £ = \° , while to = U°S n I l°g(l / ^*) J > we have ff(g n ) = 
{logn/log(l/A*)} + o(l), so <7(<? n ) > e > 4A~ £ if n is large. Since for any constant 
< a < 1 we have g(a) — > 1, while fl , (<? n ) < 1 — e, it follows that g n = o(l), so any (large 
enough) n satisfying (I52j) is normal. 

Putting t = to + 10 and uj = A' 1 such that t\ = 10 — q n — log 5/ log A in Lemma [9j 
we find 

10-g„/ 1n \ ^ 9 \S(*0+<?n+log5/logA) _ ~ \t +g(q n +log 5/ log A) 



< \X t0+ i \ < \ w ~ q "/W) < 2Ar°^'™ d/lus ^ = 2A 
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which is at most 2Al 0+9(9n) /A 5/4 = o(n^ 1 ) by (J2HJ) and (JSHD - Hence, arguing as for (1571) . 
we only need to consider vertices in S r with r > A 9n /10. 

Put 6 = [log n/ log A + 2q n \ and = {logra/ log A + 2g n }. Call n "standard" if 
3e < 4> < 1 — 3e. Since q n < £ for normal n, any n satisfying ([52]) is standard. 

As noted above, for normal n we only need to consider n and r 2 at least A 10_l?n_o( - 1 \ 
and for such cases (|4"0j) gives 

/2(ri,r 2 , 6-18) < (1 + o(l))juiju r2 e xp(-A 20 - 29 "- o ( 1 ) +6 - 18 - logn / losA + o(n- 3 )) 
= (1 + o(l))/2 ri £ r2 exp(-A 2 -*-°( 1 ) + o(n- 3 )). 

For standard n the exponential above is at most exp(—X 1+£ ~°^) + o(n~ 3 ). Hence for 
such the quantity /i(ri,r 2 ,6 — 18) goes to quickly unless ju ri or ju r2 is much bigger 
than e 100A say. From arguments as above, we know this forces r\ and r 2 to be much 
larger than the typical values of around A 10 , at least A 100 , say, and then ju(ri, r 2 , 6 — 18) 
is much smaller. Using the argument that earlier permitted us to restrict parameters to 
the set R, such cases can be neglected. Thus, whp there are no vertices in sets S n n Vb, 
S r2 n Vq that have distance greater than 2(to + 10) +6—18, for any n or r 2 . Hence 
the diameter is at most 2to + 6 + 2 whp for any n satisfying (f52l) (or indeed, though we 
won't need it, for any normal standard n). 

Continuing with standard normal n, let uj = A 10 . Then using (|53p and since g^n) > 
4A~ £ , A* 1 = e A+ °( losA ) and £ = e °( logA ), 

^9(*o+10— log w/ log A) _ ^t 

= A; 9(9 "^/n 

> exp(4A 1_s + 0(logA))/n 

> exp(3A 1 - £ )/n. 

Since the final bound is larger than A 40 /n if A is large enough, the bound (|46|) holds with 
uji= to for i = 1,2. The calculations down to (I49p go through as before, now with p\ = 
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p 2 = p and A 10 /3 < p < 2A 10 . This time we have P± = P 2 ~ tt* > exp(3A 1 " e )A~°( 1 Vra, 
using (f4"T|) and the bound above. 

Writing N for the number of pairs of vertices with property E p at distance at least 
2(i + 10) + b - 18, as before we have N ~ (Pin) 2 P, with 

/ 2 b ~ w \ 

P = exp I - ^(1 + o(l)) AM + o(n~ 3 ) 
V n i=\ ' 

in place of (|50p. Since p < 2A 10 we have 

log(l/P) < (4 + (l))\W+b-W-logn/lo g \ „ AX l+2q n -^ < 4A l- e 

for normal standard n. Since Pin > exp(3A 1_£ — 0(logA)),we thus have KN — > oo. 
The second moment argument goes through as before to show that whp iV > 1, so the 
diameter is at 2(to + 10) + 6 — 18. (Note that we still have (|5ip since (|52p forces A* to be 
much larger than 1/ra, and hence A = O(logra), so logP = o(logn).) Hence, from the 
upper bound shown above, the diameter of the graph is, for normal standard n, whp 
equal to 

2t + b + 2 = 2 [log nj log(l/A*)l + Llog n/ log A + 2a n \ , 

since being normal implies that n is not an integer power of A*. Using (|52ft again, and 
recalling that a n < e, this is 

2 [log n/ log(l/A*)l + Llog n/ log A + 2a n \ , 

completing the proof. □ 



4 Just above the critical point 

In this section we shall prove Theorem [3l which is the analogue of Theorem Q] for 
G(n, A/n), where A = 1 + e with e = e(n) tending to zero at a suitable rate. Roughly 
speaking, we shall simply repeat the arguments in Section [2] more carefully; however, 
there are many additional complications that we shall contend with as we go. As men- 
tioned in the introduction, we shall also prove a stronger result, describing the (normal- 
ized) limiting distribution of the correction term; we postpone the somewhat unpleasant 
statement of this result until Section [5j 

Throughout this section we write A for 1 + e, always assuming that < e < 1/10, 
and often that e = e(n) — > 0. As before, we write A* for the unique solution A* < 1 to 
A*e~ A * = Ae~\ so 

X, = l-e + ^e 2 -^e 3 + 0(e 4 ), 

noting (for convenience) that A* > 1 — e. 

The overall plan of the proof is as for the cases A constant and A — > oo. We shall treat 
the second phase (regular growth) in Subsection 14.11 and the first phase, approximation 
by branching process, in Subsection l4.21 To be able to carry out the third phase, we still 
need to study the distribution of the time the branching process takes to reach a large 
size. We do this in Subsection 14.31 an d prove various other branching process lemmas 
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we shall need in Subsection 14,41 In Subsection 14.51 we consider the typical distances 
in the 2-core. Finally, armed with all these results, we prove the lower bound on the 
diameter in Subsection 14.61 and the upper bound in Subsection 14.71 this turns out to be 
not as easy as one might expect, both proofs involve considerable re-examination of the 
first phase, the early growth of the neighbourhoods. 

One complication concerns the wedge condition used in Section [2j here this turns 
out to have probability 0(e 3 ), or 0(e 2 ) if we condition on the vertex being in the giant 
component. In SectionEJ we used a much stronger 'diamond' condition, that allowed us 
to simply avoid dependence between the neighbourhoods of the vertices we considered. 
Unfortunately, the diamond condition corresponds roughly to two wedge conditions, and 
has probability B(e 4 ) after conditioning on being in the giant component. When e — > 0, 
we cannot afford to give up a factor e 2 in the number of vertices we consider. 

Except that Subsections 14.31 and 14.41 belong together, Subsections 14.11 to 14.41 may 
be read in any order. We have chosen the present order as the first two subsections 
are relatively simple, and may be seen as motivating the extensive branching process 
analysis that follows. 

Throughout we write A for e 3 n, and assume that A — > oo. 

4.1 Large neighbourhoods and meeting in the middle 

In this subsection we show that whp once the neighbourhoods of a vertex become large, 
they grow at the expected rate until reaching size ^/£~nlogA, say. This is not quite as 
simple as Lemma El since when e is small, even when the neighbourhoods are fairly 
large, the expected increase in size from one step to the next may still be smaller than 
the standard deviation. Hence it may well happen that Tt[x) is smaller than Tt-i(x) 
for some t. However, this is unlikely to happen for many t in a row. 

We shall start by proving a corresponding growth result for a Galton- Watson branch- 
ing process. It may well be that a similar result exists in the literature, but we have 
not found it; the key point is the dependence of the bounds on the parameters of the 
branching process. The general theme here and throughout this section is that the be- 
haviour of the branching process is only 'regular' once it reaches sizes larger than 1/e, 
and that it is best seen on time scales of 1/e. 

Given parameters /i = 1 + e and n, consider a Galton- Watson branching process 
(Zt)t>o starting with a fixed number Nq of particles, in which each particle has a binomial 
number of children in the next generation, with parameters n and \x/n. Let Nt = \Z%\ 
denote the number of particles in generation t. 

Lemma 13. Let < e, 5 < 1 and n be given, and define (Nt) as above, with fj, = 1 + s. 
Writing uj for eNq, the probability that 

(1 - 8) NofJ <N t <(l + 5) NofS (54) 

holds for all t > is at least 1 — O(e~ co UJ ), where cq > is an absolute constant, and 
the implicit constant in the O(-) notation is absolute. 

Proof. We may and shall assume that 8 2 u> > 100, say; otherwise, there is nothing to 
prove. 



33 



We may construct (Z t ) in small steps in the following standard way: let A\,A2,... 
be independent binomial Bi(n, y/n) random variables. As we construct the process, we 
number the particles in order of the time they are born; we start by numbering the 
particles of Zq with 1,2,..., Nq in any order. To define (Zt), simply take Ai to be the 
number of children of the ith particle. Writing St for Y2t'<t ^*'> we then have 

N t = N + -l)=N + B St - S t , (55) 

i<S t 

where B { = Y,j<i A j- 
For t > -l/e set 

4 = t/* ^ /4ds ' 

o Js=~l/e 

and set 8f = if t < — l/e. Note that 5t is an increasing function of t, with 

< S t < ^ ^-m 1/(4e) * S> 
8 log/i 

using (1 + e) 1 /(4e) < e*/(*0 = e l/4 and £ / log M = e / l og (l + e) < 1/log 2. 
The key property of St is that if t > and r = t — 1 /e, then 

<5t - *r > (t - r) £ 4^ m = ^"* /4 /8. (56) 

o 

For t > 0, let E t be the event that N t > (l + S^No/J holds but N s < (l + 5 s )N n s for all 
< s < t. Suppose that the upper bound in ([5~i"|) fails for some t. Then JV* > (l+StjNo/j? 
for this t, and it follows that one of the events Et holds. 

Suppose that E t holds for some t > 0. Set r = t — l/e, and, for convenience, set 
N s = fjb s No for all negative integers s, so X] s <o N s = Nq/e. Then, with all sums starting 
at — oo unless otherwise indicated, 



St + N /e = ^iV s < ^(l + ^)^ s iVo 

s<t s<t 

< ^(l + <5 r )/i s iVo+ Yl ( 1 + s th" N o 

s<r r<s<t 

= £(1 + <5 t )//iV - - «5 r )/i s iVo 



s<t s<r 

No 



£ 



^(l + St)^-(S t -S r )n^) 



No ( M , r ,.., , r r y 



since = (1 + e) Ll/eJ < (1 + e) 1/e < e < 4. 

For each fixed i, let /(i) = (1 + e)« denote the expectation of Bi = Ylj=i Ar From 
the above, we have 

f(S t ) -S t + N = eS t + N = e(S t + N /e) < N (l + 5 t )// - N (5 t - 8^ /4. 
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On the other hand, since E t holds we have N t > (1 + S^^Nq, so from (|55|) it follows 
that 

B St - S t + N = N t > (1 + <St)^JV . 

Combining the two equations above, using (|56|) . and recalling that Nq = uj/e, we see 
that 

B St ~ f(S t ) > N (5 t - 5 r )^/A > AT (<5 M -*/ 4 /8)m74 = Jwe" 1 /^ 4 ^. (57) 



On the other hand, from the bound on St + Nq/e above we have, very crudely, 

S t < — (1 + 5t)lJ? < 2uE- 2 n t . (58) 



£ 

From ([57D and dSSJ it follows that 5 St - /(St) > g(S t ), where 



^(i) = max{(L;£-732,*- 3 / 4 <5u;VV/760} . 



Let be the event that — f(i) = Bi — EBi > g(i). We have shown that if one of the 
events E t holds, then so does one of the events Fj, At this point we could simply bound 
the probability of the union of the Fi by the sum of their probabilities, but as they are 
highly dependent, this is rather inefficient. 

Let T = \oj/e 2 ] , noting that T > 100 and T < 2uj/e 2 . For k = 0, 1, 2, . . ., let G k be 
the event U kT<i < {k+1)T F h so 

{JeA <p(u*h = ff) (U G 'fc| <E p (^)- 

f>l J \i>l J \k>0 J k=0 

Finally, let G' k be the event that -B(fc + 2)T — ^-S(fc+2)T ^ 9(kT). Let us estimate P(G" A , | 
Gk)- We test whether G k holds by examining each Bi in turn, stopping at the first 
i > kT for which Fi holds. Suppose G k does hold, and that we stop at i = i', so 
kT < i' < (k + 1)T. Recalling that Bi = X^xiA?? where the Aj are independent 
with distribution Bi(n, n/n), we have not yet examined any Aj, j > i' . Hence the 
conditional distribution of A = B^ k+2 )T~ Bi' = 'Yli l <j<(k+2)T Aj is just its unconditional 
distribution, which is binomial with mean (1 + e)((k + 2)T — i') > T > 100. It is easy 
to check (for example from the Berry-Esseen Theorem) that this binomial distribution 
(which is well approximated by a normal distribution) exceeds its mean with probability 
at least 1/3. But when this happens, 

B {k+2)T - E B (k+2)T = Be - E B it + A - E A > B v - E B v > g(i') > g(kT), 

since we are assuming F^ holds, and g(-) is non-decreasing. Thus, given G k , the event 
G' k holds with probability at least 1/3. Hence F(G' k ) > P(G fc )/3, so ¥(G k ) < 3P(G" fc ). 

Now Gq is the event that B 2 t, a binomial distribution with mean hq = (1 + e)2T < 
AT < 8loe~ 2 , exceeds its mean by at least xq = 5a;e~ 1 /32. Since xq < /io, Lemma i 
applies, and we see that P(G' ) < exp(-x§/(3// )) < exp(-5 2 a;/24576). 

For k > 1, G' k is the event that -B(fc+2)T; a binomial distribution with mean ji k = 
(l+e)(fc+2)T < 2Aku£- 2 exceeds its mean by x k = g{kT) > g{koje~ 2 ) > jfe 3 / 4 we -1 V60. 
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Since < fi^, by Lemma [6] we have ~P(G' k ) < exp(— cok l l 2 5 2 u) for some absolute 
constant Co > 0. Hence 

P(3t : ^t) < 3^P(G" fc ) < e- co<52w + e~ Cokl/252ul = 0( e - c " s ' 2ui ), 

k fc>l 

recalling that J 2 u; > 100. As noted above, if the upper bound in (|54p fails, then some 
£* holds, so we have proved that the upper bound holds with the required probability. 

The argument for the lower bound is almost identical. Let E' t be the event that 
N t < (1 — S^Nq^ holds but N s > (1 — 5 s )No/j, s for all s < t. Changing signs in the 
argument above, we see that if E' t holds then the equivalent of (|57p holds, namely 

B St - f(S t ) < -^£-y i/4 /32. (59) 

We have already shown that (j58H holds with the required probability. If (j58H does hold, 
then (|59|) implies that B{ — f(i) < —g(i) holds for some i. We may bound the probability 
of this event just as for Fi above, completing the proof. □ 

Turning to the graph, Lemma [13] enables us to prove the required growth result. 
There are various possible choices of the parameters here; we make one (somewhat 
arbitrary) one that will be useful later. 

Lemma 14. Let e = e(n) < 1 satisfy A = e 3 n — > oo. Set A = 1 + e, uo = A 1 / 6 , and 
M = ^Jijjen. For x £ V(G(n, A/n)) and r > 0, let E xr be the event that 

(i-2 W - 1 / 3 )A*|r r (x)| < |r r+t (x)| < (l + w-^jA^r^x)! 

holds forO<t<T = log(eM /uS) / log A. Then 

F(E x , r | |r (x)|, . . . , \T r (x)\) > 1 - O(exp(-c w 1 / 3 )) = 1 - o(A- 100 ) 

whenever uj/e < |T r (x)| < 2uj/e and X^ r '<r 1^(^)1 < n 2 / 3 . 

In other words, once we reach size u/e in the neighbourhood exploration, provided 
we have not so far used up too many vertices, the neighbourhoods grow at the expected 
rate until they reach size M. Note that if A = e 3 n > (logn) 20 , then the error term in 
the form (9(exp(— cquj 1 ^)) is o(n~ 100 ), i.e., utterly negligible. 

Proof. Condition on the exploration up to step r, assuming that we find between u/e 
and 2uj/e vertices in the last generation and have seen at most n 2 / 3 vertices so far. Let 
N' t = |r r+ j(x)|. The (conditional) distribution of the process {N' t )t>o is very similar to 
that of {Nt): the only difference is that each vertex gives rise to a binomial Bi(m, A/n) 
number of children in the next generation, where m is the number of vertices not seen 
so far. 

For the upper bound on the neighbourhood sizes, we simply note that m < n, so 
(N[) is stochastically dominated by (Nt). The result thus follows immediately from 
Lemma [131 

For the lower bound, set n' = n — 2n 2 / 3 . Note that if the upper bound holds, 
which it does with probability 1 — 0(exp(— cquj 1 ^ 3 )), then by time T we have used 
at most n 2//3 + WM/e < 2n 2 / 3 vertices, so we still have at least n' left. As long as 
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we have used up at most 2ra 2//3 vertices, the process (N[) stochastically dominates a 
process (N") in which each particle has Bi(re',A/n) children. This binomial has mean 
u = An' /n = (1 + e)(l — 271" 1 / 3 ). Applying Lemma [13] again, it follows that with 
probability l-O(exp(-c w 1 / 3 )) we have |r r+t (x)| > (l-w -1 / 3 )/i*|r r (i)| for < t < T. 
Since T = log^ 3 / 2 ™ 1 / 2 ^ 1 / 2 ) / log A < log(A 1 / 2 )/(e/2) = £ _1 logA < n l ^/uj, we have 
a 1 j\ l > 1 — 3/u for t < T, so the lower bound follows. □ 

Remark 3. Let us note that, while the various constants can certainly be improved, 
Lemmas [13] and Q3] are tight in several ways. Firstly, since the survival probability of 
the branching process 3L\ = (X t ) is of order e, if we start from a neighbourhood T r (x) of 
size a/e, the neighbourhood exploration process will die quickly with probability e~®( a \ 
Hence, in order to make it very likely that the neighbourhoods grow at the right rate, 
we certainly need |T r (x)| to be much larger than 1/e. In other words, neighbourhoods 
are only 'large' over size uj/e, for some u — ► oo. Similarly, the form exp(— Q(5 2 uj)) of 
the error bound in Lemma [131 is best possible. 

These comments demonstrate a recurring theme in this section, namely that the 
behaviour of the branching process should be viewed on a 'size scale' of 1/e. The proof 
of Lemma [T3l also illustrates another recurring theme, that the appropriate 'time scale' 
is also 1/e. 

Finally, when e is close to the lower end of the range we consider, we cannot extend 
Lemma [141 to growth much beyond size \fen\ shortly beyond this point, the number of 
vertices 'used up' is sufficient to slow the growth appreciably. Fortunately, neighbour- 
hoods of two different vertices are likely to join up when they have size around y/en, as 
we shall now see. 

Knowing the growth rate of neighbourhoods once they become large, it is easy to 
determine the distribution of the time taken by the large neighbourhoods of two vertices 
to join up. 

For x S V(G) and a > 0, let t a (x) denote the smallest r for which |r r (x)| > a, if 
such an r exists; otherwise t a (x) is undefined. 

Lemma 15. Let e = e(n) and A = 1 + e be such that e — > and A = e 3 n — > oo. 
Set to = A 1 / 6 , and t^ = log(e 3 n/a; 2 )/ log A. Let x and y be two vertices of G(n, A/n). 
Writing E for the event that t w / £ (x) = r\, t u j e (y) = r2, and the graphs G< ri (x) and 
G< r2 (y) each contain at most n 2 / 3 vertices and are disjoint, we have 

F(d(x, y)>r 1 + r 2 + t 2 + a\E)= e -( 1 +°( 1 )) Aa + 0( e ™ c »- 1/3 ) = e ^ 1+0 ^ xa + o(A- 10 ) 

(60) 

for any function a = a{n) > —t2/2, and 

F(d(x, y) <n + r 2 + t 2 - K \E) = o(l) 
whenever K = K(n) is such that eK — > oo. 

Proof. It suffices to prove the first statement: since log A = 6(e), if eK — > oo then 
-fTlogA — > oo, so \~ K — > 0, and the second statement follows immediately from the 
first. In proving (160 j) . we may assume that a < a max = log to/ (2 log A): otherwise, 
A a > a; 1 / 2 , and the error term in (|60p dominates the main term. Then (|60p asserts 
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only an upper bound on the relevant probability, which follows from the same bound 
for a = LOmaxJ • 

We explore the neighbourhoods of x and y in the usual way, initially stopping each 
exploration when we first reach a neighbourhood of size greater than uj/e. At this point, 
the conditions of the theorem allow us to assume that we have used up at most ra 2//3 
vertices in each exploration, and that the explorations have not met. Using the Chernoff 
bound Lemma [U it is easy to check that with probability 1 — o(e~ w ) the last generation 
in each exploration has size at most (1 + \fe)uj/e ~ u/e; otherwise, a generation of 
size less then oj/e has many more neighbours than it should, by about y/uj standard 
deviations; see Lemma [28] for a similar argument. 

We now continue both explorations. At the start of step i, i = 0, 1, 2, . . ., we have 
explored the neighbourhoods of x out to distance n + \i/2] and those of y out to distance 
r 2 + / 2j . During step i, we first test whether any of the 1-3/2] (x)| |r r2+ ij/ 2 j (y)\ 
'cross-edges' between these two neighbourhoods is present. If so, d(x, y) = r\ +r 2 +i+ 1, 
and we stop. Otherwise, we uncover the next neighbourhood of x or y as appropriate 
and continue, stopping if we have found no cross-edge by step t 2 + a, in which case 
d(x, y) > ri + r 2 + t 2 + a. 

After t 2 + a max steps as above, the typical size of the neighbourhood of x or y reached 
is (w/e)A'- <2+amax ^ 2 , which is roughly io^^y/en. In particular, this size is much less than 
the quantity M defined in Lemma [bH Hence, by Lemma [U we may assume that 

\T n +j(xi)\ ~ A j |r ri (xi)| ~ X j uj/e 

for i = 1, 2 and all j < (t 2 + a m ax)/2, where x\ = x and x 2 = y- Furthermore, the error 
terms, which are factors of the form (1 + 0{y/e) + O^ 1 / 3 )), are uniform in j. 

It follows that at step i we test (1 + o(l))A* (u> / 'e) 2 potential cross-edges, and that by 
any step i > t 2 /2 we have tested in total 

i i 

(1 + o(l)) £ X j (uj/e) 2 ~ {ui/ef XJ = (w/e^A* 

3=0 j=-oo 

potential cross-edges. (The bound i > t 2 /2 is used for convenience only, to allow us to 
approximate the sum from j = by the sum from j = —00.) 

Since each cross-edge tested is present with its original unconditional probability 
of A/n ~ 1/n, it follows that up to a 0(e~ ujl/3 ) error term (from the conclusion of 
Lemma PT4l not holding, etc), the probability that the explorations do not meet by step 
t 2 + a is 

p> = (1 - A/n)( 1+ °M)" 2£ ~ 3At2+a . 

Since 

log(l/p> a ) ~ (l/n)a,V 3 A t2 A a = A a , 
the proof is complete. □ 

Roughly speaking, Lemma [15] tells us that once the neighbourhoods of two vertices 
reach a decent size, oj/e, then whp these neighbourhoods then meet within 0(1/ e) steps 
of 'when they should'. To study the diameter of G(n, X/n), we shall need the full strength 
of the bound actually proved. Note that there is variation of order 1/e in the actual 
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time taken to meet, as may be seen from ()60|) . where the values of a giving probabilities 
bounded away from and 1 have a = 0(l/e). Note that the neighbourhoods meet 
when they have size around y/en, rather than \fri as one might expect. The reason is 
that once the neighbourhoods reach size around \/en, they have probability around e 
of meeting at each step. But it takes 1/e steps until the neighbourhoods grow much in 
size, so they have 1/e chances to meet in this size range. The fact that neighbourhoods 
typically join up when they each have size \fen explains one factor of e in the first log 
in d5J). 

In the light of Lemma [T5l as in the case of A constant or A — > oo, the key to 
understanding the diameter of G(n,X/n) is understanding the distribution of the time 
taken until the neighbourhoods of a vertex reach a reasonable size (now uj/s); this will 
be our aim in the next few subsections. Note that there is a wide flexibility in the choice 
of lo that will work: the requirements are that to is at least a certain power of log A, and 
at most a certain power of A. If A is large enough to allow u>/logn — > oo, then many 
arguments simplify; we shall not assume this, however. 

In the remainder of this section we explain why Lemma [15] already gives us the 
typical distance between vertices in the giant component, if A = e 3 n is at least (logn) 20 , 
say. Indeed, the neighbourhoods of a random vertex of G = G(n, A/n) behave much like 
the branching process 3L\ = (X t )t>o, at least to start with. Roughly speaking, a vertex 
is in the giant component if and only if the corresponding branching process survives, 
which it does with probability s ~ 2e. So we will be interested in the expected size of 
\Xt\ conditioned on the process surviving. 

Lemma 16. Let S be the event that X\ survives. Then 

E(\X t \ | S) = A '~ (1 ~ S)A1 , 
s 

which is asymptotically X t /s ~ A*/(2e) if e — ► and et — ► oo. 

Proof. Writing 1^ for the indicator function of an event A, we have 

E(|X t |l 5 ) = E(|X f |) - E(|X t |l 5 c) = A* — (1 — s) E(|X t | | S c ) = A* - (1 - s)\{, 

since the distribution of X\ conditioned on S c is that of 3£\*- The result follows. □ 

It is not hard to see that the 'typical' size of \Xt\ given S is of the same order as 
the expected size; we shall give some precise results on this later. Hence, for most 
vertices in the giant component, their neighbourhoods take time log ujJ log A to reach 
size u/s, so the typical distance is 2 log oj/ log A + t<i = log(e 3 n)/log A. More precisely, 
one can check that the distance between two random vertices of the giant component is 
log(e 3 n)/log A + O p (l/s); we shall not give the details. 

4.2 Branching process to graph 

At some point, we need to compare the probabilities of events defined in terms of our 
random graph G = G(n, A/n) with events in the branching process. It turns out that we 
have to consider events involving trees of height ©(log A/ log A) = 0(e _1 log A), recalling 
that A = e 3 n, with (it will turn out) around 1/e vertices at each distance from the root. 
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In other words, we need to consider trees with at least 0(e~ 2 ) vertices. If e is smaller 
than re -1 / 4 , then we cannot simply extend Lemma [5] to cover such trees using the same 
proof, since the error terms \T\ 2 jn would be too large. 

Fortunately, it is easy to prove a result that applies for the trees we need. Although 
this is in some sense a coupling result, the obvious coupling between G° <t (x) and X\ 
fails here. This obvious coupling is based on the fact that a Po(A) and a Po(A(l — 5)) 
distribution can naturally be coupled to agree with probability at least 1 — X5. In fact, 
much better couplings are possible. Recall that X<t denotes the first t generations of 
X\, seen as a rooted tree. 

Lemma 17. Let A = 1 + e, where e = e(n) = 0(1). Let 5(n) be any function with 5 > 
and 5^0 as n —* oo. Let t = t(n) > and let T = T(n) be a rooted tree of height t 
with e\T\ 2 < 5n, each generation of size at most n 1 / 3 , and \T\ < 5n 2 / 3 . Then 

F(G°< t (x) = T) ~ F(X<t =i T) 

and 

F(G< t (x) = T) ~P(X< t ^T), 
where the asymptotics is uniform over all such sequences T(n). 

Proof. Rather than couple, we simply calculate directly; it is convenient to order the 
vertices first. When constructing X\ starting from Xq, let us number the particles 
1, 2, 3, . . . in the order they appear, so the initial particle is x\, and we test particles in 
numerical order to see how many children they have. We number the vertices uncovered 
in the neighbourhood exploration process by which we find G< t (x) analogously, this 
time using any (deterministic or random) rule to decide in which order to number the 
children of a vertex. 

For each numbering T* of T that can arise in such an exploration, let E\(T*) be 
the event that X<t is isomorphic to T* with the labels matching. Then {X<t — T} is 
the disjoint union of the events £i(T*), where T* runs over all numberings of T; note 
that these events are equiprobable. Similarly, let E2(T*) be the event that G< t (x) = T* 
with labels matching, so {G< t (x) = T} is the disjoint union of the E2 (T*). Fix one 
particular numbering T* . Since the probabilities of Ei(T*) and E2(T*) do not depend 
on the numbering, it suffices to show that P(£i(T*)) ~ F(E 2 (T*)). 

Let r be the number of vertices of T at distance t from the root, and m = \T\ the 
total number of vertices. For 1 < i < m — r, let di denote the number of children in 
T of the ith vertex. Now E\(T*) is simply the event that for i = 1, . . . ,m — r, the ith 
particle of the branching process has exactly di children. Thus, 

m—r 

P(£aCn) = n dT e_A - 

i=l l ' 

Similarly, .E^T*) is the event that for every i, when exploring the neighbours of the 
zth vertex reached, we find exactly di new neighbours. Let Ui = 1 + J2j<i dj denote the 
number of vertices already 'used' (reached) at the point that we look for new neighbours 
of the ith vertex. Then 

m—r m—r i \ 

P(£ 2 Cr*)) = II P(Bi(n-« i) A/n) = d i ) = n K - - , , W (A/n) d '(l - A/n)"~"^. 
. . . , di. 
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Hence, 



F(E 2 (T*)) _ yr (n-u t ) (dt) (l-\/ n ) n ~ u ^ 

XX r» di p— A 



r F(Ei(T*)) n d * e- 

Since n — Ui > n/2 and di is bounded by n 1 / 3 , we have (n — u« — j) = (n — Ui)e°( n 2/3 ) 
for < j < dj, so 

= _M + 0(d t ( Ul /nf) + 0(n~ 2 / 3 ^) = + 0(n- 2 / 3 ^), 

n n 

using Uj < |T| < n 2 / 3 in the last step. Also, 

(n - t*i - di) log(l - A/n) = (n - ^ - «£)(- A/n + 0(l/n 2 )) 

= -A + UjA/n + 0((ii/n) +0(n _1 ). 

Hence, 

m—r / \ _ j \ m ~ r \ _ j 

lo gp = E (^^r 1 + °( n_2/3 ^) + °( n_1 )) = °w + E 

using di = m — 1 = o(n 2 / 3 ). 

Now A = 1 + e, and ^ Uj < m 2 . By assumption em? = o(n), so 

m—r ^ m—r ^ m—r ^ 
log/9 = 0(1) + V] Ui - = O(l) + V] V] Ui — . 



n ^— ' n ^— ' n 

i=i i=i i=i 



We can rewrite the final sum as *Yl™Li Ylj u i/ n i where j runs over the children of i. 
Each j in the range 2 up to m appears exactly once in the double sum, so the sum is 
equal to Y^j=2 u 3' l n i where j' is the parent of j. For any vertex j, the vertex j' is in 
the generation before j, so Uj — uy is at most twice the maximum number of vertices 
in a generation. We have assumed this maximum is at most n 1 / 3 , so \uj — Uji 
Hence, 



m—r m 

Ui \ - Ui' 



i ogP = o{i) + Y J --Y.- 

m—r m 

= Ol + — +V- — - — V — 

n ^— ' n z — ' n 

i=2 i=m— r+1 

= o(l) + o(l) + 0(n' 2/s ) - 0(rm/n) = o(l) + 0(rm/n) = o(l), 

and the first statement follows. 

For the second, it suffices to prove that P(G<t(x) = T | G< + (x) = T) ~ 1. But this 
is immediate since there are at most n 1 / 3 )!") = o(n) extra edges that we must test. □ 
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For any fixed k, Lemma [T71 extends to k starting vertices and k trees, with virtually 
the same proof. 

Lemma 18. Fix k > 2. Let A = 1 + e, where e = s(n) = 0(1). Let 5{n) be any function 
with S > and 5 — » as n — > oo. Let T%, . . . be rooted trees, with e|Tj| 2 < Sn, each 
generation o/Tj of size at most n 1 / 3 , and |Tj| < 5n 2 ^ 3 . Given distinct vertices x\, . . . ,Xk 
of G = G(n, X/n), let E = E(xi, . . . , x^, T\, . . . , Tk) denote the event that G<t t (xi) = Tj 
for 1 < i < k, and d(x{, Xj) > ti + tj for 1 < i < j < k, where t{ is the height ofT. Then 

k 

P(E)~J]P(X< ti -Ti), 

i=l 

where the asymptotics is uniform over all choices of T\, . . . , Tjt . □ 

In other words, the event that the tj-neighbourhood of each X{ is isomorphic to Tj, 
and these neighbourhoods are disjoint, has asymptotically the probability suggested by 
independent branching processes. One can prove Lemma [18] by adapting the proof of 
Lemma [T7] in the obvious ways. Alternatively, it follows from Lemma [T7] by simple 
calculations. 

4.3 Slow initial growth: the branching process 

In this subsection we study the probability that the branching process X\ survives, but 
takes much longer than usual to reach generations of some large size. One might expect 
the results we need to be in the branching process literature, and perhaps they are. 
However, we have not found them. The key point is that here A is variable, tending 
down to 1 from above, so results for fixed A are not of much use. Furthermore, although 
there is a natural scaling limit as A — > 1 from above (described below), results about 
this limit are not directly applicable either: we wish to consider events of probability 
around 1/n, and this probability tends to as A — > 1. In other words, we need explicit 
bounds on the rate of convergence of some properties of the branching process as A — > 1. 
Fortunately, as is often the case, the branching process results we need are not hard to 
prove directly. 

With A = 1 + e > 1 fixed for the moment, let X\ = (X t )t>o and 3^ = (A^ + ) t >o 
be defined as before, so C Xt is the set of particles in Xt which have descendants 
in all future generations. Recall that is again a Galton-Watson branching process, 
with |Xq I = 1 or depending on whether X\ survives, and with offspring distribution 
Z\. Here, as before, Z\ denotes the distribution of a Poisson Po(sA) random variable 
conditioned to be at least 1, where s = s(A) is the probability that X\ survives for ever. 

From standard results (see Athreya and Ney [3], for example), we have |Xt|/A — ► 
Y = Y\ a.s., and |X 4 + |/A* — » Y + = Y^ a.s., for some random variables Y\ and Y&. Our 
first (standard, trivial) observation is that Y\ and Y^~ coincide up to a constant factor. 

Lemma 19. We have Y^ = sY\ a.s. 

Proof. Fix 5 > 0, and let iV = N(5) be a suitably chosen large integer. From standard 
results, with probability 1, either X\ dies out, or there is some minimal t with \Xt\ > 
N. Choosing N large enough, at this time t the inequalities ||Xt|/A* — Y\ < 6 and 
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IIX^I/A* — Y + \ < 5 hold with probability at least 1 — 5. But t is a stopping time, so 
given t and \Xt\, each particle in Xt survives independently with probabiltiy s, and from 
the Chernoff bounds, provided N was chosen large enough, the ratio between \X^~\ and 
\Xt\ is within a factor 1 ± 6 of s with probability at least 1 — 5. 

Hence, with probability at least 1 — 35 either = Y\ = 0, or = (1 + 0(5))sY\ + 
0(5). The result follows by letting 5^0. □ 

Our ultimate aim is to estimate P(0 < \X t \ < oS) in the range of parameters where 
this probability is very small (around l/(s 2 n), it will turn out). Essentially, this reduces 
to estimating the lower tail of Y; in the light of Lemma [T9l we may study Y + instead. 
This turns out to be easier, since (X^~) is in some sense 'better behaved' than (Xt) when 
e -» 0. 

When studying = (X^) it makes sense to condition on the event that Xq is 
non-empty, i.e., that 3L\ survives. Let us write X~j^ = (X^) t >o for the conditioned 
process, i.e., a Galton-Watson process with offspring distribution Z\ started with a 
single particle. Let Y + denote lim^oo |X t + |/A*, which exists a.s. Thus Y + is simply 
Y + conditioned on Y + > 0, up to a set of measure 0. By standard results, Y + is a 
continuous random variable with strictly positive density on (0,oo). 

It turns out that we will need both upper and lower tail bounds on Y + = Y^ . The 
dependence of these bounds on e = A — 1 is very important. We start with the upper 
tail. 

Lemma 20. There is an absolute constant c > such that for any 1 < A < 2 and any 
x > we have ¥(Y+ > x) < 2e~ cx . 

Proof. Recall that Z\ denotes a Poisson distribution with mean sX conditioned to be 
at least 1, where s = s(X) is the positive solution to 1 — s = e~ Xs . Set 



f x (x)=E(x z *) = J2 



.k 



X 



(s\) k e- sX _ (e xsX - l)e~ sX _ e^" 1 )^ - e~ sX 



k\(l-e- sX ) s s 

Note that /a(1) = 1, and, expanding about x = 1, we have 

sA 2 

fx(x) = 1 + X(x - 1) + —(x - l) 2 + • • • = 1 + X(x - 1) + 0(sX 2 (x - l) 2 ), 

provided sX(x — 1) is bounded. More precisely, recalling that s ~ 2e as A — ► 1, and that 
A < 2, it is easy to check that if < x < 2, say, then we have 

fx(x)<l + X(x-l) + C l e(x-l) 2 (61) 

for some absolute constant C\, which we shall take to be at least 1. 

Suppressing the dependence on A in the notation, for t > let gt(0) = E(e e '^ t '/ A )— 1. 
Since \Xq\ is always 1, we have go(9) = e e — 1 = l+6+O(0 2 ) for bounded; in particular, 

9o(0)< 9 + 6 2 (62) 

if 9 < 1. 
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Given N = |AT^|, the conditional distribution of l-X^+il is simply the sum of N 
independent copies of \X^~\, so 

g t+1 {9) = E(E(e e l^ + +1 l/ At+1 | N)) - 1 = E(E(e^ + l/ At+ y ) - 1 

= E((l + g t (P/\)) N ) - 1 = / A (l + g t {9/\)) - 1, 

since N = \X^\ ~ Z\. With 6 and t fixed, set y r = <? r (6>/A*~ r ), so y t = gt(6), Vo = 
goiO/X 1 ), and y r+1 = / A (l + y r ) - 1 for < r < t - 1. From flST]), if y r < 1, then 

2/r+i < Ay r + Ciey^ < Ay r (l + C x ey v ). (63) 

Suppose < l/(100d) < 1/100. Then we claim that 

y r < 2e/X t ~ r (64) 

holds for r = 0,1,..., t. This is certainly true for r = 0, since yo = <7o(0/A*) < 
(0/A*)(l + 0/A*) < 20/A*. If dUD holds for r = 0, 1, . . . ,s - 1, then in particular y r < 1 
for r < s, so from (|62|) and ()63[) we have 



ii7 4 (i+ e,xt)xs n^ 1 + ^ ^ ^ exp ( * + Cie £ • 



Using dMD for r < s, we have £ r<s y r < £ <r<t 20/A*~ r < 20 £ r > = 20 Since 
< l/(100Ci), (|64|) for r = s follows, completing the proof of (|64[) by induction. 

Setting r = t in (|6"4"]l. we have in particular that yt = gt{Q) < l/(50Ci) < 1 for 
& < 9o = l/(100Ci). Hence the moment generating functions E(e e ^ Xt ^ xt ) are uniformly 
bounded by 2 for all 1 < A < 2 and all 9 < 6 . 

With A fixed, we have |X^"|/A* — > Y + a.s. By Fatou's Lemma, it follows that 

E(e e?+ ) < liminf E(e fl| ** +I/At ) = hminW0) < 2. 

t— >oo t— >oo 

Applying Markov's inequality, it follows that for any x we have P(Y + > x) < 2e~ 9x , 
completing the proof of the lemma. □ 

Our main application of the upper tail bound above is to show that the sum of many 
independent copies of Y + is tightly concentrated. 

Lemma 21. Let c, A and 5 be positive constants. There is a constant a = a(c, A, 5) > 
such that, if Z is any random variable satisfying the tail bound F(\Z — E Z\ > x) < Ae~ cx 
for all x > 0, and S n is the sum of n independent copies of Z, then 

F(\S n /n-EZ\ >5)< e~ an 

for all n> 1. 

Proof. For \6\ < 1/c, let <j>{9) = E(e 9(z ^ 5) ) where fx = EZ; the tail bound on Z 
implies that (f)(6) is finite. Then (f)(0) = 1, (f>'(0) = —5, and 

(f)"(6) = E((Z - fi - 5fe e ^ z -^) , 
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which is bounded by a constant due to the tail bound. Hence there are positive constants 
c" (which we may take smaller than l/(cS)) and d such that (f)(0) < l—c'8 2 when = c"S. 
Now with Z\, . . . , Z n independent copies of Z and S n = ^ Zi, we have 

F(S n > n(n + 8)) < E e 9 Sn e ~-e^+8)n 

= 4>(0) n by independence of the Zi 
< (1 - c'5 2 ) n < e- c ' &2n . 

An exponential upper bound on P(5 n < n(fj, — 8)) is obtained by considering (f)(0) = 
E ^ e e(»-8-Z)j sim ii ar iy. □ 

Using Lemma [2T1 it is easy to show that up to an error probability that is exponen- 
tially small in u, the martingale |A" t + |/A* has essentially converged to its (almost sure) 
limit Y + by the time that \X^~\ first reaches size uj. As before, it is crucial that the 
concentration we obtain is uniform in A as A \ 1. 

Lemma 22. Let < <5 < 1, 1 < A < 2 and u> > 1 be given, let t u = min{i : \X^\ > lo}, 
whenever this is defined, and let E be the event that \X^~\/\ t is within a factor 1 rb 8 of 
Y + for all t>t u . 

Then t u is defined with probability 1, and F(E) = 1 — e - ^^, where the implicit 
constant depends on 8 but not on A. 

Proof. The sequence |A t + | is non-decreasing, and increases with probability bounded 
away from zero (at least F(Z\ > 1)) at each step, so Xf — » oo a.s., and t^ is indeed 
defined with probability 1. 
Let A be the event 

A = {(1 - 8/io)\x+\/x^ < y+ < (1 + 5/W)\X+\/X^}. 

Our first aim is to show that A is very likely to hold. Let us condition on the event 
tui = t, where t > 0, and also on \X^\. Since t^ is a stopping time, given that t^ = t 
and |A t + | = to, the descendants of the m > oj particles in X t + form independent copies 
of the original process. Let n t >^ denote the number of descendants in generation t' of 
the ith particle in X^~ . Then for each i we have rif j/A* ~* — > a.s., where the Y^ are 
independent and have the distribution of Y + . It follows that Y + = Yl^Li Y { + /X^ a.s. 
Now Y + has mean 1, and m>uj. Applying Lemmas 1201 and 1214 we see that 



1 m 

m — ' 

i=l 



> 8/10 = e- U(m > = e 



so 



(A) = l-e~ n ^\ 

Let B_ be the event that t^ is defined, and there is some t > t u for which |A t + |/|X^ | < 
(1 — 8/2)X t ~ tu ' . If holds, let t\ be the first such time. Then t\ (which is not always 
defined) is again a stopping time so, arguing as above, given that t\ = t and \X^\ = m, 
we have Y + = X^i^^/A* 1 , where the Y^~ are iid with the distribution of Y + . This 
also holds if we condition on the entire history up to time t\, and in particular on t w 
and r = \Xt I. 
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By definition of B_ we have m < mo = (1 — 5/2)X t *T, so recalling that the Y^~ are 
independent copies of Y + , 

/ m \ I |m J \ 

p J^y+ > (i + 5/io)m < p ^ y+ > (i + <y/io)m = e - n(mo) = e - Q(w) , 

using Lemma I2T1 and the fact that A*~*"r > r > uj for the last step. If B_ and A both 
hold, then the event appearing on the left above also holds, so we have shown that 
F(A | jB_) = e~ n H Hence, P(y4 nB_)< P(A | B_) = e~ n K 

Define B + to be the event that there is some t > t w for which \X^ \/\X^\ > (1 + 
5/2)\ t ~ tuj . A similar but simpler argument shows that P( J 4fl B + ) = e - ^^). Hence with 
probability 1 — e - ^"^ the event A holds, while neither nor B + does, and the lemma 
follows. □ 

Returning to the original branching process X\ = (Xf), recall that this survives 
with probability s = s(A) = 0(e), where A = 1 + e. Recall also that AQ/A* — > Y a.s., 
where by standard results E(Y) = 1, and Y = if and only if the process dies out, 
so P(y 7^ 0) = s. Also, recalling that Y + has the distribution of Y + conditioned on 
Y + > 0, Lemma [191 implies that the distribution of sY given that Y ^ is exactly the 
distribution of Y + . 

The next lemma will be similar to Lemma 122} but concerning X\ = (Xt). This 
will lead us to consider the sum SV of N independent copies Yi of Y, for u large and 
N > Lo/e. Given < 8 < 1, from concentration of the binomial distribution, with 
probability 1 — e _ ^( w ) the number M of i with Y{ ^ is within a factor 1 ± 5 of its mean 
sN = Q(uj). Conditional on M, the variable sSn is the sum of M independent copies of 
sY each conditioned to be positive, or equivalently of M independent copies of Y + , so 
by Lemma EH with probability 1 — e~^ M ^ this sum is within a factor 1 ± 5 of its mean 
M. It follows that with probability 1 — e~~ n ^ we have \Sn/N — 1| < 35, say. Using 
this fact in place of concentration of the sum of uj copies of Y + , the proof of Lemma [221 
gives the following result, which is more or less a sharpening of Lemma [TBI Recall that 
Y = lim Woo |AQ|/A*. 

Lemma 23. Let 5>0 ; 1<A<2 and uj > 1 be given, and set e = A — 1. Let 
t-Lu/e = niinjt : \Xt\ > uj/e}, whenever this is defined, let be the event that t w j £ is 
defined, let E C be the event that |AQ|/A* is within a factor 1±5 ofY for all t > t w i E , 
and let S = {Vi : \Xi\ > 0} be the event that the process survives. 

Then ¥(E \ S^) = 1 — e~^^ , where the implicit constant depends on 5 but not on 
A. Furthermore, P(«S | SJ) = 1 - e~ Q( -^ and P(5 W \ E) = 0{ee~ n ^). 

Proof. The first statement follows by modifying the proof of Lemma [22] as described 
above. The second is an immediate consequence (and also easy to verify directly). It 
implies in particular that P(«S)/P(«S W ) is bounded below, so P^) = 0(P(«S)) = 0(e). 
The final statement then follows from the first. □ 

Lemma[23]tells us that for uj large enough, the probability that the branching process 
takes much longer than expected to reach size ujje is essentially determined by the tail 
of the distribution of Y = Y\ near 0. Lemma [22] will be useful in studying this tail 
indirectly. 
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Writing IR + for the set of non-negative reals, let (3 ; t)teR+ be a standard Yule process. 
Thus 3b consists of a single particle, and each particle in the process survives for ever 
and gives rise to children according to a Poisson process with rate 1, independently 
of the other particles and of the history. Note that |3^t| is a (random) non-decreasing 
function of t, and that E(|3^|) = e • It is well known that lim^oo |3^t|/e exists with 
probability 1; we denote this (random) limit by W. 

It is not hard to see that as A decreases to 1, the suitably rescaled process 
converges in some sense to (3^)- All we shall need is a very weak result of this form. 

Lemma 24. Let T > be fixed. As A = 1 + e tends to 1 from above, the distribution of 

I \T/e\ I conver 9 es t° /|3^t|- 

Proof. We take snapshots of (3^) at times separated by e, i.e., consider Y n = y ne , 
n = 0, 1, . . . , T. Each particle xmY n always survives to Y n+ \, has no children in Y n+ \ 
with probability P(Po(e) = 0) = e~ £ = 1 — e + 0(e 2 ), and has exactly one child in Y n+ \ 
with probability P(Po(e) = 1) = e + 0(e 2 ). Furthermore, the probability that this child 
(if it exists) has children of its own by time (n + l)e is 0(e). Hence, the number Z' 
of descendants of x in Y n+ \ is 1 with probability 1 — e + 0(e 2 ), two with probability 
e + 0(e 2 ) and three or more with probability 0(e 2 ). Hence Z' and Z\, the offspring 
distribution in can be coupled to agree with probability 1 — 0(e 2 ). 

Using the independence properties of 3L~^ and of (3^), it follows that these processes 
can be coupled so that the event E = {\X+\ = \Y n \, n = 0, 1, . . . , L^/eJ} fails to hold 
with probability at most 

0{e 2 ) Y, H\Y n \) = 0(e 2 ) £ e £ " = 0{e)e T = 0(e). 

n<T/e n<T/e 

Since y £ \r/e\ = 3¥ with probabiltiy 1 — 0(e), the result follows. □ 

Corollary 25. As X = 1 + e tends to 1 from above, Y^~ converges in distribution to W. 

Proof. Fix 5 > 0. It suffices to show that for e sufficiently small we can couple Y^ and 
W so that they agree within a factor of 1 + 0(5) with probability 1 — 0(5). 

Let a; be a constant to be chosen below, depending on S but not on e. Since |3^| — ► oo 
with probability 1, there is some T such that P(|3^r| < uj) < 5. From Lemma [24"1 if 
e is sufficiently small, then we may couple Xt and (3^t) so that with probability at 
least 1 — 5 we have |3^t| = l-^jr/ejl' Then with probability at least 1 — 25 we have 

\y T \ = \x+ T/ei \>u. 

Let n = \T/e\ . Applying Lemma [22l it follows that if uj is chosen large enough 
(depending only on 5, not on e), then with probability at least 1 — 35 the limit Y^~ 
is within a factor 1 ± 5 of \X^ |/A n . A similar result holds for (3*). (Indeed, since 
|3^t|/ e * ~^ W a.s., there must be some constant T" such that with probability 1 — 5 we 
have 13^1 /e* within a factor 1±5 ofW for all t > T' .) In particular, if uj is large enough, 
then with probability 1 — 5 the ratio I^tI/c^ is within a factor of (1 ± 5) of W. 

Putting the pieces together, and noting that A n = (1 + e)L T / £ J = e T + O(e), for e 
small enoug h the quantities F A + , \X+\/X n , \X+\/e T , \y T \/e T and W agree up to factors 
of 1 + 0(5) with probability 1 — 0(5), completing the proof. □ 
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It is well known, and not hard to check, that the (positive) random variable W asso- 
ciated to the Yule process has an exponential distribution with mean 1. In particular, 

F(W < x) = 1 - e~ x ~ x (65) 

as x — > from above. We are now ready to prove our bound on the lower tail of Y^~. 
Theorem 26. Let A = 1 + e. As e and x tend to from above we have 

F(Y+ < x) ~ aJog(W/iogA_ 

Note that we make no assumption on the relative rates at which e and x tend to 
zero. With x fixed, the result would be immediate from (|65p and Corollary 1251 

Proof. Let 5 > be given. We must show that there are constants xq = xq(5) and 
£o = £o(<5) such that for all < e < Eq and < x < xq we have P(Y A + < x) = 
e O(<5) a Jog(i/A*)/iogA^ wnere the implicit constant is absolute. 

By (|65p . there is an xi > such that for all x < x\ we have 

e~ 5 < P(W < x)/x < e & . (66) 

Fix such an xi, and set xo = min{xi, 5}. 

Trivially, for (1 — 5)xq < x < xo and any A, we have 

F(W < (1 - 5)x ) < F(W < x) < F(W < x ) 

and 

P(^ A + < (1 - 5)x ) < P(Y" A + < x) < P(F+ < x Q ). 

As e — > 0, from Corollary 1251 for any constant a we have P(Y A + < a) — > P(W < a). 
Applying this with a = xo and a = (1 — S)xq, it follows that there is an £q such that 

e - 5 F(W < (1 - <5)x ) < P(Y A + < x) < e 5 P(iy < x ) (67) 

for all e < £q and all x in the interval I = [(1 — 5)xo, xo]. We may and shall assume that 
£o < 1/10, say. Since log(l/A*)/ log A — ► 1 as e — > 0, reducing eo if necessary, we have 
^logO/A^/iogA = e O(8) x un if orm iy in x G / and e < e - Using flSBJ and §7$, it follows 
that 

p(y+ < x ) = e o(s) x io g (i/x*)/io g \ ( 68 ) 

for all x E I and e < eoi where the implicit constant is absolute. 

At this point we return to the definition of in terms of (X^~). Recall that 
Z = Z\ is a Poisson distribution with parameter sX conditioned on being non-zero, and 
that E(Z) = A and P(Z = 1) = A*. Now Y+ has the distribution of the sum of Z 
independent copies Y\,...,Yz of Y A + /A. Hence, for any x, 

P(Y+ < x) > P(Z = 1, Y+ < x) = P(Z = l)P(Yi < Ax) = A„P(Y A + < Ax). 

Given any x < xq, there is some non- negative integer r such that x' = x\ r lies in 
/. From the inequality above it follows that P(Y A + < x) > A^P(Y A + < x'). Applying 
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([68]) to bound the second probability, and noting that A* = exp((— r) log(l/A*)) 
exp(log(l/A*)log(x/x')/logA), it follows that 

P(Y+ < x) > ( a; /a; / ) Io g( 1 A*)/log^ e OW( a j')log(VA*)/logA _ e O(5) Jc log(l/A*)/lDgA j 

completing the proof of the lower bound. 

For the upper bound we use the inequality 

P(Y+ <x) = F(Z = 1, Y+ < x) + F(Z > 2, Y+ < x) 

< P(Z = l)P(Yi < Ax) + P(Z > 2)P(Y 1 +Y 2 < Xx) 

< p(z = i)p(y a + < \x) + p(z > 2)p(y+ < Xx) 2 
= A 7t p(y+ < Xx) ( i + i^^p(y+ < Ax) 

V A* 

Given x < xq, as before there is a non- negative integer r such that x' = xA r G i". For 
< « < r let Xj = x'/X i , so xo = x' and x r = x. Let pi = P(Y A + < Xj), so 

Pi+i < Kp%(l + A^ x (l - A*)p») 

and hence, by induction, 

Pi < KPo II^ 1 + ^ - ^ A ^o exp (A" 1 - 1) Y,Pi I • ( 69 ) 

j<i y j<i J 

Now po = P(^^ < x') and x' G /, so, recalling that xo < 5, we have po = 0(6). Thus 
Po < 1/10, say, if we assume 5 is small, which we may. It follows by induction on i that 
Pi < 2A^?o < ^1/5- Indeed, if this holds for j < i, then the term inside the exponential 
in (|69l) is at most 

oo 

(A^ 1 - 1) Y, Xi/5 < X~\l - A,) = A-V5 < 1/4, 

i<« j=o 

and e 1//4 < 2. Plugging pj < 2X{po back into ([69]) . we see that p^ < A^po ex P(3po) = 
KtPQ e °^ ■ Calculating as for the lower bound as above, this establishes the required 
upper bound. 

□ 

Remark 4. The method used above shows that for A and x bounded above, P(Y\ < x) 
is within a factor C of x log ( 1 / A *^ logA , where C depends only on the bounds we assume 
on A and x. For A constant and x — > 0, this is a standard result. Perhaps surprisingly, 
the conclusion of Theorem 1261 does not hold in this case: as pointed out to us by Svante 
Janson, the limiting behaviour of P(Y A + < x)/x log( - 1//A *-V log A as x — > is oscillatory. One 
period corresponds to changing x by a factor of A, and the tail probability by a factor 
of 1/A*. 

In the light of Lemma QjJ] and the fact that Y" A + is just Y A + conditioned on being 
non-zero, an event of probability s = s(X) ~ 2e, Theorem 1261 has the following corollary. 
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Corollary 27. Let A = 1 + e. As e and x tend to from above we have 

P(0 < Y x < x/e) ~ 4ea; i°g(iA*)/iogA_ 

Proof. Let s = s(A) denote the survival probability of 3L\. Then 

F(0 < Y\ < x/e) = P(0 < Y x + < sx/e) 

= F(Y+ > 0)P(Y+ < sx/e | Y+ > 0) = sP(F A + < sx/e), 

where the first step is from Lemma [T2] and the rest are from the definitions. Applying 
Theorem [26l recalling that s ~ 2e as e — > 0, and noting that log(l/A*)/logA ~ 1, it 
follows that 

P(0 < n < x/e) ~ ^(sx/^WAO/iogA ^ 2e((2+o(l))x) log(1/A * )/logA ~ 4 ex 1 °s( 1 / A *)/ lo s A , 
as claimed. □ 

In turn, Corollary 1271 and Lemma |2"51 will give us the required estimate on the prob- 
ability that the branching process Xx = {Xt)t>o takes a long time to begin to have a 
large population. 

Before turning to our tail bound, let us make a simple observation. 

Lemma 28. Let A = 1 + e and let tM = min{£ : \Xt\ > M}, whenever this is defined. 
Given thattM is defined, the probability that \Xt M \ exceeds (l + <5)(l + e)M is e~ n ( s M \ 
where the implicit constant is absolute. 

Proof. The event that tM is defined is a disjoint union of events of the form {tM = 
t, \X t _i\ = s}, where < s < M. Let us condition on one such event. Then the 
conditional distribution of \Xf \ is that of a binomial distribution with mean (1 + e)s < 
(1 + e)M conditioned to be at least M. It is easy to check that the probability that 
such a distribution exceeds (1 + <5)(1 + e)M is maximal when s is maximal, and is then 
(from the Chernoff bounds) of the form e~ n ( s M \ □ 

In the following result, t u i e denotes min{t : \Xt\ > w/e}, whenever this is defined. 

Theorem 29. Let A = 1 + e, and suppose that e = e(n) — ► 0, to = oj(n) — * oo, and 
t = t(n) satisfy t < 100 log to/ log A and et — > oo. Then, with t\ = log uj/ log A, we have 

ntu/s>ti + t)~Ae\{, 

P(0 < \X r \ < uj/e, < r < t\ + 1) ~ 4eA* , 
P(0 < |X tl+t | < uj/e) ~ 4eAl 

and 

P((X r ) survives and < |Af tl+t | <oj/e) ~ 4eA* . 
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Proof. We first show for any fixed 5 > we have 

P(* u / e > *i + 1) = (1 + 0{5))Ae\l + 0(ee-^ M ), (70) 

P(0 < \X r \ < lo/e, < r < t x + t) = (1 + 0(5))4eAi + 0(e e - Q(w) ), (71) 

P(0 < \X tl+t \ < u/e) = (1 + 0($))4fA* + 0(ee _n ^) (72) 

and 

P((X r ) survives and < \X tl+t \ < u/e) = (1 + 0(5))4eAl + 0(ee" Q(w) ). (73) 

Note that the events considered in the first two of these are not quite the same: by 
the event A = > t\ + t} we mean the event that t u / e is defined and greater than 

t\ + t; this certainly implies the event considered in (fTTj) . but the latter may also hold 
with T = t w / e undefined. Let S be the event that the process survives, noting that 

P(<S | {t u/e is defined}) > 1 - (1 - s) u,e = 1 - e~ n{uj) . (74) 

In particular, for large n this conditional probability is at least 1/2, so the probability 
that T is defined is at most 2s = 0(e). 

Let B\ be the 'bad' event that T = t^u is defined and there is an r > T with |X r |/A r 
outside the interval (1 ± 5)Y\. By Lemma [23l we have ¥(B\ \ T defined) = e - ^^-*, so 
P(J3i) = 0{ee~ n ^). Let B 2 be the event that T is defined and \X T \ > (1 + S)u/e. If 
e is small enough, which we may assume, then (1 + 5) > (1 + e)(l + 5/2), and from 
Lemma [28]we have ¥(B 2 ) = e~ n ^/^ = 0{ee~ n ^). 

Suppose that B\ does not hold. Then if t w u is defined, the process survives. Thus, 
off B\, the event that T is defined coincides with S and hence with the event Y\ > 0. 
Moreover, off B x U B 2 , whenever Y x > we have Y x = (1 + 0(5))\X T \/X T = (1 + 
0(5))(uj/e)/\ T . Thus, off B\ U B 2 , for all sufficiently large constants a, b > 0, 

(i) y A > {l + a5){uj/e)/\ tl+t = (l + a5)/(e\ t ) implies T < h + 1, and 

(ii) < Y x < (1 - b5)/{e\ t ) implies T > ti + 1. 

Since e£ — > oo we have 1/A* 0. Thus Corollary [27] applies, and recalling that T{B\ U 
B 2 ) = 0(ee~ n ^), we obtain the bound (|70j). 

To deduce ([71]) , it suffices to show that the probability that the indicated event holds 
but T is undefined is o(eA*). Recall that up to probability events, T is defined if and 
only if S holds, i.e., if and only if the process survives. So it suffices to bound the 
probability that > but S does not hold. Now 

F(\X tl+t \ > 0, s c ) = F(s c )n\x tl+t \ > | S c ) = (1 - s)P(|X t ; +t | > 0) ~ H\Xf 1+t \ > 0), 

where (X~) is the process conditioned on dying out, which has the distribution of Xa*. 
As we shall see shortly (see Lemma I3T]) . P(|X~| > 0) = 0(eA") as e — > and a — ► oo 
with a = £1(1 /e). Since log A = 0(e), we have t\+t>ti = £1(1/ s), so 

F(\X tl+t \ > 0, 5 C ) = 0(eAl 1+t ) = 0(eAi(l/w) log(1/A * )/logA ) = o(eA*), 

using log (1/ A*) /log A ~ 1 and w — > oo for the last step. So ([7T|) follows. 
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To see that (|72p holds, note that we may extend the implication (i) above to imply 
|A tl+t | > Lu/e. Also (ii) can trivially be extended to imply < u/s. Again 

applying Corollary [27] gives ([72"]) . Now ([73]) also follows since survival coincides with 
Y x >0. 

To deduce the various statements in the theorem hold, under the assumptions given 
on e, u and t but with 5 > fixed, we have eX^ = oj~ 0<k1 \ while any function that is 
e -^M decreases faster than any power of u, so the probabilities in (|70p ^ (|73p are all 
asymptotically (1 + 0(5))4e\+. Since 5 > is arbitrary, the same conclusion follows for 
5^0 slowly enough. □ 

Theorem [29] is the analogue of Lemma [U giving (in the relevant range) the distribu- 
tion of the time the branching process takes to grow to a certain size. It will turn out, 
however, that we need several further results about the branching process. 



4.4 Further branching process lemmas 

Theorem 1291 gives good bounds on the probability that the branching process grows more 
slowly than expected. It will turn out that we also need a bound on the probability that 
it grows faster. Such a bound is immediate from Lemmas 123] 1191 and 1201 However (to 
handle the case where e 3 n grows slowly), when el is small we shall need a bound that 
is stonger than the one obtained this way. This is easy to obtain directly using moment 
generating functions as in the proof of Lemma 1201 Note that we study here rather 

than(|X+|). 

Lemma 30. Suppose that < e < 1/10 and et < 1/10. Then for all N > 20t we have 

n\x t \>N) ^-v^ 20 *). 

Proof. Let m r (9) = Ee 9 '^' be the moment generating function of \X r \, so mo(9) = 
e 9 < 1 + 29 for 9 < 1/2. We have 

m r+1 {9) = E(m r (#) |Xl1 ) = e A(mp( * ) - 1) < 1 + \{m r {9) - 1) + 2\ 2 {m r {9) - if 

as long as \{m r (ff) — 1) < 3/2. Suppressing the dependence on 9, setting g r = m r (9) — 1, 
as long as g r < 2/5 we thus have 

9r+i < 5r(l + e + 3^r) < 9r exp(e + 3g r ). 

Taking 9 = l/(20t), so g < 29 < l/(10i), for r < t < e^/W we claim that 

9r <5oexp(er + 3^ 5i ) < g exp(l/10 + 3/10) < 2g < l/(Wt). 

i<r 

The proof is by induction using the final bound gi < l/(10t) for i < r to establish the 
second inequality. Hence, 

E ( e [X t [/(20t)) = l +ft <l + l/(10t). 

Applying Markov's inequality to e'^'' 7 ^ 20 *) — 1, which is always non-negative, it follows 
that 

> N) < — ( e ^/( 20 *) - I)" 1 < L e -m™t) 
lOi ot 

whenever > 20t, as required. □ 
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We next turn to various events associated to the subcritical branching process 3C\^ = 
(Xj~). We start by estimating the probability that the process survives to time t, as 
well as a derived quantity associated to the wedge condition. If our aim is just to 
prove Theorem [3J then a considerably simpler form of the following lemma will do. 
However, we shall prove a more precise result useful also when it comes to studying the 
distribution. 

Lemma 31. Let e — > and set st = st(s) = P(|AT t ~| > 0). Then for t = o(l/e) we have 
St ~ 2/t, while for t > e~ 2 / 3 we have St ~ x - f £ ■ In particular, if t = £1(1 /e), then 

St = @(eXl), and if et — > oo, then st ~ 2e\\. Furthermore, 

oo 

Y\(l - s t ) ~ 7 £ 2 
t=i 

for some constant 70 > 0. 

Proof. Let §t be the probability that a critical Poisson Galton- Watson branching process 
survives to time t, so sq = 1 and s~t+i = 1 — exp(— st). It is well known, and not hard to 
check, that s^~2/tast^oo, and indeed that 

s t = 2t- Y + 0(t~ 2 ). (75) 

Moreover, ts t approaches 2 from below. Clearly, st < s~t, so we have 

s t < s t < 2/t (76) 

for all s > and t > 1. 

On the other hand, we may construct £\ t by first constructing a critical process, and 
then deleting each edge of the resulting tree with probability 1 — A* ~ e, independently 
of all other edges. If the critical process survives to time t, then there is at least one 
path of length t witnessing this, and it follows that st > s~t(l — A*)* = st(l — 0(et)). If 
t = o(l/ e), then (1 — A*)* ~ 1, so st ~ s~t and the first statement of the lemma follows. 
Note also for later that 

s t = s t - 0(et)~s t = s t - 0(e). (77) 

For larger t we use the following iterative formula, obtained by considering the num- 
ber of particles in X± with descendants in generation t + 1: 

s t+ i = F(Po(\*s t ) > 0) = 1 - e~ x * St = A*st - A^ t 2 /2 + O(sf), 

where, since A* < 1 and st < 1, the implicit constant is absolute. Note also that 
Sf+i < A^Sf. Rewriting the formula above, 

s t+ i = A*s t exp(-A*s t /2 + 0(s 2 t )). (78) 

We now simply 'guess' an approximate form for st (obtained by solving a differential 
equation, although things are not quite that simple): for t > 1, set 

2(1 - A*) 2e 

Tf = 7 ~ 7 • 

A*(A*' - 1) A^* - 1 
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Since a > 1 implies (1 + a) 1 — 1 > at, we have r t < T\jt = 2/t for all t. In particular, 
r t < 1/2 for f > 4. Also, 

A„r t A.CA-*- 1 - 1) AT/' - A, 1 - A, 

n+i K - 1 A* - 1 A/ - 1 

In particular, n+i < A*rt. Furthermore, for f > 4, which implies rt < 1/2, we have 

rt+i = A*r t (l + A^t/2)" 1 = A*r t exp(-A*r t /2 + 0(r 2 )) . (79) 

Using (|78p and (|79p . it is now not hard to show that St and rt remain close for all 
large t. Set T = [e" 2/3 J , noting that T -> oo and T = o(l/e). Note that 

A~ T = (1/A*) T = (1 + e + 0(e 2 )) T = 1 + Te + 0(T 2 e 2 + Te 2 ) = 1 + Te(l + 0(e 1/3 )), 

so 

r Tl s T = (l + 0(e 1 / 3 ))2/T, (80) 

using ([75]) and (J77|) for st- 

Let p t = s t In - 1, noting that p T = 0(e 1/3 ). Then, from flZSJ and ([79"]). 

1 + Pt+i = (l + pt)exp(-A*(st-rt)/2 + 0(r 2 + s 2 )) 
= (1 + ft ) exp(-A* ft r t /2 + 0(r 2 + S 2 )) . 

Since r t and st are bounded, we have exp(0(r 2 + s 2 )) < M(r| + s 2 ) for some absolute 
constant M. For e small and t >T we have rt < rr < 1/10, say. It follows that whatever 
the sign of pt, the exp(— \+p t n/2) term 'pulls (1 + pt) towards 1' without overshooting, 
and hence that 

\pt+i\ < |pt| + (l + |ft|)M(r 2 + S 2 ). 
Using rt < A^ _7 Yr and St < \+~ t st, it follows that 



\pt\<\p T \ + 2M A 2s (r 2 +4), 



0<s<t-T 

provided \p s \ < 1 for T < s < t. Since r 2 ~ s| ~ (4/T) 2 = G(e 4 / 3 ), while J2 s >o X l° = 
0(1/ e), it follows easily that \pt\ does remain bounded by 1, and in fact that \pt\ = 
0(e 1 ^ 3 ) uniformly in t > T. In particular, Sf ~ rt for t > T, proving the second 
statement of the lemma. The next two statements follow. 

Finally, we turn to the estimate on rif>i(l — s t)- From ([75]) we see that Y^t \^t — 2/t\ = 

J2 t 0(t~ 2 ) is bounded. It follows that X^>3 1°8 ( 1-2/t ) conver g es ; let us write c for the 
value of this sum, which does not involve e. Since T — > 00, the sum truncated at T 
converges to c as e — ► 0. Hence, from ([77]) . 

nci-a*) = n( i -^+°( £ )) = e ° {eT) u^-^ ~ a-sixi-^ n (1-2/*) ~ 70 t- 2 , 

i<T t<T i<T 3<t<T 

for some constant 70. On the other hand, comparison with an integral shows that 
n = -2 log(eT) + O(eT) = -2 log(eT) + o(l). 
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We have already seen that X^>r r ? = °(-Qj an d ^ ne same f° r s t) so it follows that 

log J] (1 - st) = o(l) -J2 s t = °(!) - (1 + °(^ 1/3 )) E r * 

= 2 log(eT) + o(l) + 0(e 1/3 log(eT)) = 2 log(eT) + o(l). 

Thus rit^iC 1 ~ s t)~ 7 T- 2 (eT) 2 = j e 2 , as claimed. □ 

( Constructing X\ first by constructing %t, and then adding the subcritical trees, we 
see that p r = ni=i(l ~~ s t) is exactly the probability that \X r \ = 1 given that \X+\ = 1. 
Fiddling around, 

P(|X r | = 1 | \X+\ = 1) = P(|X+| = 1 | \X r \ = l)P(|X r | = 1)/P(|X+| = 1) 

= s¥(\x r \ = iyn\x+\ = i) = s p(|x r | = i)/(sxd = n\x r \ = 1)/K, 

so the final statement of Lemma ED is equivalent to the statement that for large r, 
P(|X r | = 1) ~ 7o£ 2 ^* f° r some constant 70. There may be a simpler way of showing 
this! ) 

The final statement of Lemma [3JJ shows that if we start one copy of Xx* at each 
time t > 1, the probability that for every t the ith copy dies within t generations is 
asymptotically 7o£ 2 - 

Before turning to our next real lemma, let us get a simple observation out of the 
way. Trivially, E(|X t ~|) = A*; a simple inductive calculation gives the standard formula 
E(|X f ~| 2 ) = A*(l + A* + --- + A*) < e- l \l\ so Var(|X t - |) < e~^\X^\) 2 . If we start 
£a* with N > 10/ e particles in generation 0, and r < 1/e, then the size of generation 
r has expectation fi > N(l — e) l / s > N/3, and, using independence of the offspring 
of different particles, variance at most e~ 1 [i 2 /N < fi 2 /10. It follows by Chebychev's 
inequality that 

P(|X r 7| > N/6 I \X \ =N)> 1/2 (81) 

whenever N > 10/e and r < 1/e. 

Let (D t )t>o denote the union of countably many independent copies of X-x*i where 
the ith process starts with a single particle in generation i. Thus \Dq\ = 1, while given 
\D t \, the distribution of has the form 1 + Po(A*|Z?t|). 

Lemma 32. Let < e < 1/10 be given, and define A = 1 + e and A* = A(l — s(A)) 
as usual. For u > 20 and t > we have ¥(\Dt\ > io/e) = e^^^, where the implied 
constant is absolute. Furthermore, forT > 1/e, 

P(3t : < t < T, \D t \ > u/e) = 0(eTe~ n(a;) ). 

Proof. Let ft(%) = Ea;l D *l be the probability generating function of \Dt\. Then fo(x) = 
x, while from the relationship between -Dt+i and Dt above we have 

f t+1 (x) = xf t (e x ^) 

for all t > and all x. Fix t > and let xo = 1 + e/10, say. Inductively defining x r by 
x r+ i = e x *( x r~t) > i ) no te that 

t 00 

ft(x ) = Y[x r <Hx r . (82) 

r=0 r=0 
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We claim that for every r we have 

x r < l + (l-e/3) r e/10, (83) 

say. This certainly holds for r = 0. Suppose then that (|83[) holds for some particular 
r. Since A* < (1 — e/2), it follows that x r+1 < exp((l - e/2)(l - e/3) r e/10). Using 
exp(y) < 1 + y + y 2 for y < 1, we thus have 

x r +i < 1 + (1 - e/2)(l - e/3) r e/10 + (1 - e/3) r e 2 /W0 < 1 + (1 - e/3) r+1 e/10, 

and (|83|) follows by induction. 

Combining ([82} and (J83]) we have, crudely, log(/ t (x )) < 2 £ r (l -e/3) r e/10 = 6/10, 

so /t(xo) < 2. Recalling that xo = l+e/10, we thus have P(| A| >oj/s) < ft{x$)/x^ e < 
2e~^ U) \ and the first statement of the lemma follows, for all u > 2, say. 

For the second statement, suppose now that oj > 20. Using (|8T|) . and simply ignoring 
the one new particle added in each generation, for < r < 1/e, conditional on \D t \ = 
N > W/e, the probability that A+r > N/6 is at least 1/2. Let k = [l/e\. Examining 
D t , D t+ i, . . ., one by one, stopping the first time any of these sets has size more than 
oj/e, it follows that 

JP(|A+fc| > w/(6s) | 3t' : t < t' < t + k, | A' I > oj/e) > 1/2, 

so 

F(3t' :t<t' <t + k, | A' I > oj/e) < 2P(| D t+k \ > u/(6e)) = e" Q(w) , 

using the first part for the final bound. Summing over < t < T in steps of k = \ l/e\ , 
the second statement follows. □ 

Next we shall show that conditioning to survive to (at least) a certain time does 
not increase its expected total size too much. 

Lemma 33. Suppose that e > and t > 1. Let N denote the total number of particles 
in £ Ai • Then E{N | ^ 0) < (t + l)/e. 

Proof. We shall use repeatedly the observation that for any /U, the distribution of a 
Poisson Po(//) random variable conditioned to be at least 1 is stochastically dominated 
by 1 + Po(//). (This may be seen by considering the first point, if any, of a Poisson 
process in an interval.) 

We may view the first generation of Xa* as the union of two sets: the set S\ consisting 
of those children of the root that survive to time t, and the set S2 of those that do not. 
The full process is then obtained by taking a copy of the process conditioned to survive 
for t — 1 generations for each particle in Si, and a copy conditioned to die within 
t — 1 generations for each in S2. The sets Si and 62 have independent Poisson sizes. 
Conditioning on X t ~ being non-empty is equivalent to conditioning on |Si| > 1. Let us 
instead simply add a new particle to Si. By the observation at the start of the proof, 
this gives a process whose distribution dominates that of . Our new process consists 
exactly of the standard process 3£a* , together with a copy of Xa* conditioned to survive 
at least t — 1 generations started at time 1. 

Applying the same procedure to the new copy (i.e., to the children of the extra 
particle in Si, but not to those of the other particles in Si), and continuing, it follows that 
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the distribution of 3C\^ conditioned to survive to time t is dominated by the distribution 
of the union of t + 1 copies of X\^, one started at each time r, < r < t. This has 
expected total size (t + 1)/(1 - A*) < (t + l)/e. □ 

Finally, we observe that if we condition on X\ surviving, this process quickly realizes 
its conditional expected size, which by Lemma [TBI is a factor (l + o(l))/s ~ l/(2e) larger 
than the unconditioned size. 

Lemma 34. Let A = 1 + e, where e = e(n) — > 0. Let uj(n) and uj'(n) satisfy uj' — > oo 
and ui/uj' — > oo, and set t\ = [log uj/ log AJ . Then PflX^J > u'/e \ (Xt) survives) — > 1 
as n — > oo. 

Proof. This is a simple consequence of Lemma [23] together with (a weak form of) our 
tail bound on the (a.s. defined) limit Y = lim^oo |A"t|/A*. Starting with the tail bound, 
set x = 2uj' /uj = o(l). From Corollary [271 we have 

P(0 < Y < x/e) ~ 4 ex iog(i/A*)/iogA = ( £ ^ 

since log(l/A*)/ log A ~ 1. Recalling that Y > if and only if the process survives, it 
follows that P(y < x/e | (X t ) survives) = o(l). 

Conditional on survival, there is some generation with size at least uj' /e with proba- 
bility 1. By Lemma l23| with probability 1 — o(l) the first such generation occurs at time 
log (uj' je)j log A — log Y / log A + 0(1/ e). By the tail bound on Y above, this is less than 
t± with probability l + o(l). Moreover, with probability 1 — o(l), from this point on \Xt\ 
is within a factor 2, say, of X t Y. At time t\, \ ll Y > 2u//s unless Y < 2e~ x uj' /uj = x/e, 
an event of probability o(l). □ 

4.5 Typical distances in the 2-core 

We are now almost ready to prove our lower bound on the diameter of G(n,p). It turns 
out that we need a result concerning typical distances in the 2-core. Unfortunately 
this does not seem to follow easily from any published results, and our proof is a little 
painful. We first need a result that essentially bounds the A;th moment of the size of the 
giant component. 

Let G = G(n, X/n). We say that a /c-tuple (xi, . . . , x/-) of not necessarily distinct 
vertices of G is useful if for each i, either Xi is in a component of G containing a cycle, or 
it is joined by a path in G to some other xj. It turns out that almost all useful fc-tuples 
arise as they should, from vertices in the giant component. 

Lemma 35. Let e = e(n) satisfy A = e 3 n — > oo, and let k > 1 be fixed. Then the 
expected number of useful k-tuples in G(n,X/n) is (1 + o(\))(2en) k . 

Proof. The lower bound is immediate, since the number of useful /c-tuples is at least 
the A;th power of the number N of vertices in the largest component of G, and E N k > 
(EN) k ~ (2en) k . 

Let if) = ip(n) tend to infinity very slowly. 

We first get a simple observation out of the way. Let us say that a A;-tuple of vertices 
is close if each Xj, j > 1, is within distance t/j/e of xi in G. Let denote the number of 
close /c-tuples. Set t = [ip/e\. Then EC k = nEdG^x)^ 1 ), where x = x\ is any fixed 
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vertex of G. Now |G<t(x)| is stochasitcally dominated by |X< t |, the union of the first 
t generations of X\. It is easy to check (for example by calculating inductively) that 
with r fixed, E\X< t \ r = 0(t 2r - l X 2rt ) = 0{^ 2r - l e 2r ^£-^ r -^) = 5(e^ 2r ~ 1 ^), where 
/ = 0(g) if / is bounded by a function of tp times g. It follows that 

EC k = 0(ne- (2fc " 3) ) = o(e k n k ), (84) 

provided tp grows slowly enough, since g 3 ^- 3 ^- 1 — » oo. 

Turning to useful /c-tuples, we shall proceed by induction on k. Let Uk denote the 
number of /c-tuples of distinct vertices that are useful. Since en — ► oo, it suffices to prove 
that EU k ~ (2en) k . We may then bound the total number of useful /c-tuples in terms 
of Ui,... , U k - 

From now on we insist that x±, . . . ,Xk are distinct. Let us say that a useful /c-tuple 
is reducible if it contains a non-empty subset 5 which forms a close r-tuple within a 
component of G containing none of the remaining Xj. If this holds, then there is some 
set of edges present witnessing that S is a close r-tuple, and a disjoint set witnessing 
the event that the remaining set 5° is useful. (We may have k — r = 0; a 0-tuple is 
always useful.) By the van den Berg-Kesten inequality [5], the probability of this is at 
most the probability that S is close times the probability that S c is useful. Using ([84]) 
and the induction hypothesis, this probability is o(e r )0(e k ~ r ) = o(e k ). Summing over 
r and over the (^) sets S, we see that the expected number of reducible useful /c-tuples 
is o(e k n k ). 

Finally, we estimate the number of irreducible useful /c-tuples. To do so, let us pick 
x\, . . . , Xk one by one; we do not fix them in advance. Each Xi is chosen uniformly from 
the remaining n — i + 1 vertices. 

Having chosen Xi, let us explore its neighbourhoods as follows. First, if Xi itself is in 
the set R of vertices previously reached by such explorations, we do not expore at all, 
and declare Xi to be 'atypical for reason 1'. Otherwise, we explore the neighbourhoods 
of Xi as usual, except that we do not (for the moment) test for edges to R. Also, we 
stop as soon as either (i) we reach generation ip/e, or (ii) we find ip/e vertices in one 
generation t. (We then stop partway through this generation.) Let Tj denote the set 
of vertices reached. Our next step is to test all edges from Tj to R; if such an edge is 
present, x% is 'atypical for reason 2'. We then test for (non-tree) edges within the various 
neighbourhoods in Tj. If we find one, Xi is 'atypical for reason 3'. Finally, if we have 
not yet labelled x% as atypical, then we label Xi as 'good' if condition (i) or (ii) held, 
and 'bad' otherwise, i.e., if we ran out of vertices to explore. 

Note that if any Xi is bad, then Tj is its entire component, this component is a tree, 
and every vertex of this tree is within distance tp/s of X{. If (x\, . .. ,x&) is useful and 
some Xi is bad, then it follows that at least one later Xj lies in Tj, so (x±, . . . ,Xk) is 
reducible. Thus we may bound the expected number of irreducible useful /c-tuples by 
n k times the probability that no Xi is bad. We do this by showing that the conditional 
probability that Xi is atypical or good given x±, . . . , and the associated explorations 
is at most (1 + o(l))2e. 

The definition of the exploration ensures that each Tj contains at most ip 2 je 2 vertices, 
so \R\ < kip 2 e~ 2 and the probability that Xi is atypical for reason 1 is 0(e" 2 n _1 ) = 
o(e). Suppose this does not happen. Then |r^| is stochastically dominated by |A A <^ )/ / £ |, 
which has expectation Ylr<-4>/e ^ = 0(e^ 1 \^^ s ) = 0(e _1 ). At the end of the previous 
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exploration, we have already uncovered all edges incident with all vertices of each Tj, 
j < i, except (possibly) for vertices in the last two generations. (Two because we may 
have stopped part way through a generation.) There are at most 2ipk/e = 0{e^ 1 ) such 
vertices in total. Hence, given Tj, the conditional probability that Xi is atypical for reason 
2 is at most |rj|0(e _1 /n), so the unconditional probability is at most 0(E |rj|e _1 /n) = 
d{£~ 2 n- 1 ) = o(e). 

Similarly, given Tj, the probability that Xi is atypical for reason 3 is at most \Ti\(tp/e)\/n, 
since for each vertex we have to test edges to the at most tp/e other vertices in the same 
generation. Hence the probability that Xi is atypical for this reason is also o(e). 

Finally, the exploration leading to Tj is dominated by 3C\, so the probability that x^ 
is good is bounded by the probability that the branching process X\ either reaches size 
tp/e, or lasts for at least ip/e generations. It is easy to check that the probability of this 
event is (1 + o(l))s ~ 2e, completing the proof. □ 

Lemma 36. Let e = e(n) satisfy A = e 3 n — ► oo, and let C denote the 2-core of 
G = G(n, X/n). Then N = \C\ satisfies 

EN k ~{2e 2 n) k (85) 

for each fixed k. Furthermore, if d = log A/ log A — to/e with uj = uj{n) — ► oo, then 

Elf = o{e 2k n k ), (86) 

(k) 

where M d is the number of k-tuples of vertices of C some pair of which are within 
distance d. 

One might expect the first statement to be known. Indeed, Pittel and Wormald |29j 
have shown that the distribution of the size of the 2-core is asymptotically normal, with 
mean (2 + o(l))e 2 n and variance (12 + o(l))en = o(e 4 n 2 ). Unfortunately, convergence in 
distribution does not imply convergence of the relevant moments, so we cannot simply 
deduce (j85j) . We shall prove (f85j) using Lemma [35J it is then easy to deduce (|B6"j) . 

Proof. Fix k distinct vertices x%, . . . and let A be the event that x%, . . . ,Xf. are all 
in the 2-core. It suffices to show that ¥(A) < (1 + o(l))(2e 2 ) fc . 

Let G' = G — {x\, . . . , Xk}, so G' has the distribution of G'(n', A'/n') where n' = n — k 
and A' ~ A. Let U r denote the number of useful r-tuples of not necessarily distinct 
vertices of G' . By Lemma [35l we have 

EU r < (1 + o(l))(2en) r (87) 

for any fixed r. 

Suppose that A holds, and let E be a minimal set of edges witnessing A. Note that 
every vertex of S = {x\, . . . , x^} meets at least two edges of E. Also, since a vertex is 
in the 2-core if and only if it is on a cycle or on a path joining two cycles, E may be 
written as the union of k graphs with maximum degree at most 3, so at most 3k 2 edges 
of E meet S. 

Let .Eo) Ei and E2 denote respectively the set of edges of E with both ends in S, 
one end in S, and neither end in S. List the edges of E\ as aibi, 1 < i < r < 3k 2 , where 
each ai is in S and each 6, in G' . From the minimality of E, each bi is either joined to 
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some other bj by a path in E<i (which may have length if bi = bj), or is joined by a 
path in E2 to a cycle in E^. (Otherwise, removing pendant edges from E, we obtain a 
smaller witness to A.) It follows that the r-tuple (bi, . . . ,b r ) is usful in the graph G' . 
Let t = \E \. 

Suppose first that t = 0. (More precisely, suppose there is a witness E with no 
edges in S.) Then each Xi meets at least two edges of E\. Ignoring all but the first 
two, we see that there is a 2/c-tuple (b±, ... , &2fc) that is useful in G', with Xj joined to 
b2i- 1 and bn- But from ((871) and the independence of G' and the edges between S and 
G', the expected number of such 2/c-tuples is at most (1 + o(l))(2en) 2k (\/n) 2k ~ 2 2k e k . 
Since the 2/c-tuple is ordered, whenever there is one there are at least 2 k (swapping b\ 
and 62, etc), so the probability that a witness E exists with no edges in S is at most 
(l + o(l))(2e 2 ) k . 

It remains to show that the probability that there is a witness E with t > is o(e 2k ), 
for which we simply bound the expected number of such witnesses. Since each vertex 
of S meets at least two edges of E±, we have r = \E\\ > 2k — 2t, while, as noted above, 
r < 3k 2 . Hence the expectation is bounded by 

(a) /,si 3fc 2 

E J (A/n)* £ fe r (EC/ r )(A/nr, 

t=l ^ ' r=2k-2t 

since there are most („) choices for each of the t edges inside S, and, given r, at most k r 
possibilities for which of the Xj each ai is. (Some b{ may coincide, but we do not care.) 
By (I87p . each term in the sum may be bounded by a constant times 

n-\2en) r n- r = 0(n-*£ r ) = 0{ n - t e 2k - 2t ). 

Since t > 1 and e 2 n — >■ cxd, the final bound is o(e 2k ). It follows that F(A) ~ (2e 2 ) fc , 
completing the proof of ((851) . 

Finally, as noted above, it is relatively easy to deduce (f86|) from ([85]) . let M be the 
number of /c-tuples of vertices of C in which every pair is at distance larger than d. 
Then it suffices to show that EM > (1 + o(l))(2e 2 n) fc . In proving such a lower bound, 
we may consider /c-tuples with additional properties that make the analysis easier. 

Let ip = ip(n) = o(uS) tend to infinity very slowly, let E be the branching process 
event that at least two particles in generation 1 survive to generation t = ip/e, that 
these particles each have at least ip/e descendants in Xt, and that \Xf \ < V ;10 £ _1 A^/ 6 = 
e°'"£~ 1 for < if < t. Recalling that, conditioned on survival, the branching process 
typically has size of order e~ 1 X t in generations t' such that t 1 > 1/e, it is easy to check 
that F(E) ~ s 2 /2 ~ 2e 2 , the asymptotic probability that two particles in generation 1 
survive. Also, Lemma [T7I applies to all trees consistent with E. 

Given distinct vertices x\,... ,Xk of G, let E' k be the event that for every i the t- 
neighbourhood of Xi has the property corresponding to E, and these t-neighbourhoods 
are disjoint. Also, let E^ be the event that E' k holds, every xi is in the 2-core, and 
d(xi,Xj) > d for all i and j. By Lemma [181 we have F(E' k ) ~ F(E) k ~ (2e 2 ) k . Since 
EM > (1 + o(l))n k F(E k ), it thus suffices to show that ¥(E k \ E' k ) = 1 - o(l). 

But after testing whether E' k holds, we have not looked at any edges outside the 
relevant neighbourhoods. The expected number of paths of length at most d joining one 
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pair vertices in the last generation of these neighbourhoods is bounded by 



l<i<d i<d 

There are at most ( 2 )e°^£~ 2 pairs to consider, so the probability of finding any such 
path is at most 

e ow £ -3 n -i x d = e ow A -i AA - w / 6 = e o(*) e -(i+»(i)V = o( l). 

Also, since for each of x\, . . . ,Xk we have two neighbours with many (at least ip/e) 
descendants in generation t, given E' k is it very likely that these neighbourhoods continue 
to expand and eventually meet, so whp each Xi is in C. Thus F(Ek \ E' k ) = 1 — o(l), as 
required. □ 

In fact, one can easily bound the expected number of pairs of vertices of C at distance 
significantly larger than log A/ log A, noting that all but at most o(e n 2 ) such pairs also 
have the property E' 2 . Using Lemma [15] it is then easy to extend the argument above 
to show that if x and y are chosen uniformly at random from C, then 

d(x,y) = lo g A/logA + O p (l/e). 

Furthermore, one can obtain the limiting distribution of the correction term without 
too much difficulty. We omit the details as this is not our focus, and Lemma [36] is all 
we shall need to know about the 2-core. 

With the simple preliminaries of the last few subsections behind us, we are now 
ready to begin the proof of Theorem [3] 



4.6 The lower bound on the diameter 

In this section we shall prove the lower bound on the diameter in Theorem [3] As noted 
in Section [T] we may assume that e — > 0. The argument we present will be rather 
complicated. It is difficult to explain why this the case, other than to say that we have 
tried many promising simple approaches, and while several are extremely plausible, we 
could not make the details rigorous. Of course, a much simpler proof may nevertheless 
exist. 

We must show that with high probability vertices x and y at large distance exist. 
In doing so we may focus on vertices x and y whose neighbourhoods satisfy certain 
restrictions, although if we are too restrictive, we will not get a good bound. Before 
turning to the graph, let us describe the corresponding restrictions on the branching 
process. Overall, our aim is to consider the event that a certain wedge condition holds, 
and t w i e > t, for t near to + ti, but to make our arguments work we need some additional 
technical conditions. We start by insisting that the process (A 4 + ) consisting of those 
particles with infinitely many descendants has size 1 for a large number of generations, 
then bifurcates, and the non-surviving descendants of all the particles up to this point 
have died out before very long, in a way to be made precise. This condition will include 
an analogue of the weak wedge condition described in Subsection 12.21 



61 



For the rest of this section let e = e(n) satisfy e — ► and A = e 3 n —* oo. Set 
and let 

f 1 = Llog w/ log AJ 

and 

f = Llog(e 3 n)/log(l/A,)J, 

as before. (The rounding to integers will always be irrelevant in calculations.) 

For r,q = 0(l/e), set Tq = to + r and Ti = to + 1\ + </, noting that log A, log A* ~ e, 
so Tq,T\ = 0(e _1 log A). We shall assume that |r| < to/2 and that |r|, |g| < ti/10; these 
conditions hold for n sufficiently large. 

Let A = A r be the event that |X£j = 1 and = 2. Then P(A) = sF(Z x = 

1) T °P(Z A = 2), where, as before Z x is a Poission with mean sA conditioned to be at 
least 1. From (|7|) we have ¥(Z\ = 1) = A*, while from the definition of Z\ we have 
W(Z X = 2)/F(Z x = 1) = (sA)/2 ~ e. Hence, 

P(A) ~ 2eA?eA 7t ~ 2e 2 A? ~ 2 e - 1 n" 1 A^ = 9(e^ 1 n- 1 ). (88) 

When A holds, let Xj denote the unique particle in Xf for < i < T , and let y, y' be 
the two particles in +1 . 

Let -B = fi r be the event that A = A r holds, and the following conditions are 
satisfied: 

(i) (the strong wedge condition) xq has no children other than x\ and, for 1 < i < Tq, 
no children of X{ other than Xi+\ or y, y' have descendants in generation 2i. 

(ii) no particles in Xt +i other than y and y' have descendants in Xp/, where T[ = 

t +[h/2\. 

Note that Tq < T[ < T\ and e(T[ — Tq) — > oo. For the moment we could simply 
write T\ in place of T[ in condition (ii), but for the distribution it is convenient that T[ 
does not depend on q. 

Unfortunately, it takes some effort to examine the effect that condition (i) has upon 
the distribution of t u / e , the time the branching process takes to reach size ui/e. (Condi- 
tion (ii) presents no problems.) Constructing X\ from by adding independent copies 
of the subcritical process 3£a* starting at each particle, condition (i) says that for i < Tq 
the subcritical process started at Xj dies by time max{i, 1} (measured from its starting 
time), and condition (ii) that for i < Tq the process started from Xj dies by time T[ — i. 
Writing d± = 1 — st = ¥(\Xj~\ = 0) for the probability that dies by time t, we thus 
have 

To 

¥(B | A) = di Y[d min{iin _ i} , 

i=i 

so 

min{T ,T 1 '/2} 

oo oo 

di n di<¥(B\ A)<diY[di H d t . (89) 

i=l i=l i=T[-T 
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By Lemma EH as ei — » oo we have Sj ~ 2eA*, and so log(l — s») ~ — 2eA^. Since 
T{ - T = [tiJ/2 - r = *i/2 + 0(e _1 ), we have e(T{ - T ) -► oo, so 

2 log(l-*)~-2E £ A: = 0(Ar i '- To )=o(l). 

i>T^-T i>T{-T 

Hence, n*>T{-T rf » ~ L Similarly, emin{T , T{/2} = eT{/2 -> oo, so rii>mm{To,T{/2} d i ~ 
1. From (|89p it then follows that 

^1/2 oo 

¥(B | A) ~ di Yl di ~ di JJ d( ~ di7o£ 2 = e~ A *7 e 2 ~ 7oe _1 e 2 , (90) 
i=i i=i 

using Lemma [3T1 to estimate the infinite product. 

Let C be the event that A holds, and the particles y and y' each have at least u'/e 
descendants in Xt x , where <J = ^[uj = A 1 / 12 . (Later we shall need to know that vertices 
corresponding to y and y' have many descendants in Xt ± ; this will ensure that xt is in 
the 2-core.) By Lemma l3ll applied with T\ — (To + 1) =t\ + q — r — I = t\ + 0(1/ e) in 
place of t±, i.e., with A* 1+g_r_1 = <d(u) in place of to, we have 

P(C | A) = l + o(l). 

We would like to impose the condition that \Xt\ < uj/e for < t < T\\ however, for 
technical reasons we must consider the descendants of xt separately from the remaining 
particles. 

Let D\ be the event that A holds, and between them the particles y and y' have fewer 
than (u — 2uj')/e descendants in each set X t , Tq + 1 < t < T\, noting that to — 2uj' ~ to. 
Conditioning on A, the trees of descendants of the two particles y, y' form independent 
copies of £\, each conditioned on the event that it survives. By Lemma l23l whp as 
soon as the number N r (y) of descendants of y in Xx +i+ r is large compared to e _1 , 
it then remains close to YX r , where Y = \\m.N r (y) / \ r has the distribution of Y = Y\ 
conditioned to be positive. Let Y^ have the distribution of the sum of two independent 
copies of Y each conditioned to be positive. Then it follows that 

P(£>i | A) = o(l) +P(y 2 A Tl ~ TW <(w- 2J)/e). 

Now A Tl-T ° -1 = A* 1+ ^- r - 1 = ujA q - r +°W ~ uj\ q - r , and u - 2w' ~ u, so 

P(Di | A) = o(l) + F(Y 2 < (1 + o(l))X r -ye) = o(l) + F(sY 2 < (2 + o(l))e £ ^) , 

recalling that s ~ 2e and noting that, since e(r — q) is bounded and A = l + e + 0(e 2 ), we 
have \ r ~ q ~ exp(e(r — qj). In a moment we shall sum over r; we can evaluate the sum 
of the corresponding terms above by relating it to a certain disjoint union of events and 
using Theorem 1291 While this is aesthetically pleasing, we in fact know the asymptotic 
distribution of Y 2 , so we shall just use it. 

Recall from Lemma [19] and Corollary [25] that sY conditioned on Y > has the dis- 
tribution of Y + = Y±~, which converges in distribution to an exponential with parameter 
1 as e — > 0. It follows that sY 2 converges in distribution to the sum of two independent 
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such exponentials, which has distribution function F(x) = fy =Q ye y = 1 — (x + l)e x . 
Thus 

P(D 1 | A) = F(2e r '- q ') + o(l), 

where r' = er and q' = eq and we use uniform continuity to remove the (1 + o(l)) factor 
in the argument of F. 

Since r' — q' = 6(1), we thus have P(Z?i \ A) = 6(1), and hence the above equation 
can be written as F(D 1 | A) ~ F(2e r '- q '). Since P(C | A) = 1 - o(l), it follows that 
P(C n D\ | A) ~ F(2e T '~ q '). Given A, the events -B and C C\ D\ are independent, so 

P(C nDi | inB)~ F(2e r '~ q '). (91) 

Turning to particles other than the descendants of y, y', first let Z)^ be the event 
that A holds and, for Tq < t < T±, the set Xt contains at most cj'/e particles that are 
descendants of xt but not of y or y' . Given An B, these particles form a copy of 3L\ 
starting at xt and conditioned to die within T[ — Tq generations. This process may be 
viewed as 3C\^ conditioned to die by a certain time, so its distribution is dominated by 
that of . It follows easily (for example from the bound for D 2 we shall prove in a 
moment) that P((£>i) c | A n B) = o(l). 

Let D2 be the event that A holds and, for < t < T\, the set Xt contains at 
most u'/e particles that are not descendants of xt - Given An B, the tree of particles 
that are not descendants of xt has the distribution of one copy of 3L\ t started at each 
time t, < t < To, conditioned on the various copies of £a* dying by various times. 
This distribution is dominated by that studied in Lemma 1321 so by Lemma [32] we have 
P(L>c I AH B) = 0{eT ie - Q ^) = o(l), recalling that T x = O^ 1 log A). 

Let D = D x n D[ n D 2 . Since P((£>i) c U D c 2 \ A n B) = o(l), from we have 

¥{CnD I AnB) ~ F(2e r '- q '). 

Note for later that if D holds, then \Xt\ < to/e for t < T\. 

Finally, setting E rA = An B nC n D, and recalling ([88]) and ([90]), we have 

P{E r , q ) ~ 2e~ 1 n- 1 \l^e- l e 2 F{2e r '~ q ') ~ 2 7o e- 1 ee- r 'F(2e r '- ,? ')/^- 

Since this estimate holds uniformly in r, q with r,q = 0(l/e), it also holds uniformly 
in r, q with |g|, |r| < 2M/e, say, for some function M = M(n) tending to infinity. For 
\q\ < M/s, let E q = U-2M/£<r<2M/e For fixed q, the events E rjQ are disjoint, so we 
have 

P{E q ) ~ 2 7o e- 1 en- 1 ^ e- r 'F(2e r '- g ') 

-2M/e<r<2M/s 

= 2 l0 e- 1 en- 1 e- q ' ^ e -( r '- 9 ')F(2e r '~ 9 '). 

-2M/e-g<r-g<2M/e-ij 

The sum above simplifies considerably, since it corresponds to splitting a single event 
according to the time that (X^~) first subdivides. Rather than using this observation, 
we simply calculate. Since F(x) = 0(1) as x — > 00 and F(x) = 0(x 2 ) as x -> 0, the 
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sum above has exponentially decaying tails. Recalling that r' and q' simply denote er 
and eq, it follows easily that 

/oo 
e- x F(2e x ) dx. 
-oo 

A simple computation shows that the integral evaluates to 2, so 

F(E q ) ~ 47 e- 1 n- 1 e- £<7 ~ 4 7o e- 1 n~ 1 A«, 

uniformly in |g| < M/e, provided M = M(n) tends to infinity sufficiently slowly. 

Note that the event E q requires that y, y' E Xt +v an even t depending on an infinite 
number of generations of the process 3C\. To work with the graph, we seek an event 
depending on a finite number of generations of X\. Let F q be the event corresponding 
to E q but depending only on the first T\ = to + t\ + q generations. More precisely, 
F q is the event that there are exactly two particles, y and y', say, in some generation 
To + 1 = to + r + 1) —2M/e < r < 2M/e, with descendants in generation Ti, each of 
these particles has at least u>'/e descendants in X^, y and y' have a common parent 
xt , the equivalent of the strong wedge condition (i) holds, and D = D% n D' t n D<z 
holds. From the strong wedge condition, if F q holds then, in the tree obtained from 3C\ 
by deleting all descendants of xt , the initial particle is the unique particle at maximum 
distance from xt ■ 

If E q holds, then so does F q . Furthermore, F(E q \ F q ) = 1 + o(l), since for each of y 
and y' , the probability that none of its at least u//e descendants in generation T\ goes 
on to survive for ever is 0((1 — s) w l £ ) = o(l). Hence, 

F(F q ) ~ F(E q ) ~ 4 7o e- 1 n- 1 A2. 

Let T be a tree of height t = T\ consistent with F q . Then t = 0(e^ 1 log A), while, 
since D holds, each generation contains at most uje~ 1 = a;A~ 1 / 3 n 1 / 3 = o(n 1 / 3 ) vertices. 
Also, the total size \T\ of T is 0(we" 2 logA) = 0(wA" 2 / 3 n 2 / 3 log A) = o(n 2 / 3 ), and 
e\T\ 2 = 0(uj 2 e~ 3 log 2 A) = 0(o; 2 A _1 nlog 2 A) = o(n). Lemma H71 applies to all such 
trees, telling us that 

F(G< t (x) = T) ~ P(G< t (x) = T) ~ P(A< t ^ r). 

Let denote the event that G^t^x) is a tree satisfying the property F q , where 

T\ = to + t\ + q. Summing over all such trees, we see that 

F(F q (x)) ~ P(F 9 ) ~ 4706-^-^2 (92) 

uniformly in q such that \eq\ < M, for some M — > cxd. 

Let go be chosen so that eq® tends to minus infinity very slowly, and let F(x) = 
F qo (x). Let be the number of vertices x for which F{x) holds; then 

E N = nF(F qo (x)) ~ 47oe _1 A2° -> cxd. 

We are now almost finished: it remains to use a second moment argument to show 
that N is whp large, and then to bound the probability that two vertices satisfying the 
relevant condition are close. 
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Given distinct vertices x and y of G = G(n, X/n), let A(x, y) be the event that F(x) 
and F{y) both hold, with the trees 'witnessing' this being disjoint. For trees T± and 
T2 consistent with F qo , by Lemma [T8l the probability that the relevant neighbourhoods 
of x and y are disjoint and isomorphic to T\ and T2 repsectively is asymptotically the 
product of the individual probabilities. It follows easily that 

F(A(x, y)) ~ F(F(x))F(F(y)) = F(F(x)) 2 . (93) 

At this point, it seems that there should be a simple argument involving 'pulling the 
trees of the 2-core and reattaching them randomly'. However, once again, we did not 
manage to make such an argument precise in a simple way. 

Our next aim is to show that it is very unlikely that F(x) and F{y) hold and 
the trees witnessing these events overlap. Recall that if F{x) holds, then there is a 
unique 'first' vertex in the neighbourhoods of x with two children with descendants in 
generation to + 1\ + q. Let x' denote this vertex. Since the two children of x' each have 
at least u'/e = A 1 / 12 je descendants in generation to + t\ + with probability at least 
l-o(A- 100 ) , say, their neighbourhoods continue to grow, and eventually meet, in which 
case x' is in the 2-core. Let F{x) be the event that F{x) holds and x' is in the 2-core, 
so P(F(x)) ~ ¥(F(x)). Also, let B\ be the 'global bad event' that there is some vertex 
x such that F{x) holds but x' is not in the 2-core. Then 

P(Bi) < n¥{F{x))o(A- W0 ) = o(AJ'A~ 100 ) = o(l), 

assuming, as we may, that eqo > — log log A, say. 

Similarly, if F(x) and F(y) hold, then it is very likely that x and y are in the same 
component. Writing £>2 for the event that there are x and y in different components 
such that F(x) and F(y) hold, we have P(£>2) = o(l). 

For our second moment bound, we will study N, the number of vertices x such that 
F(x) holds. Note that whp N is equal to N, since B\ has probability o(l). Also, 

E N = n¥(F(x)) ~ n¥(F(x)) =EiV~ 47 e _1 Af -> 00. 

Let A(x, y) denote the event that F{x) and F(y) hold, with the trees witnessing F(x) 
and F(y) disjoint. If A(x, y) holds, then so does A(x, y). On the other hand, continuing 
to explore as before, we see that given A(x,y), the vertices x and y are likely to be in 
the 2-core, so 

P(i(x, y)) ~ ¥(A(x, y)) ~ F(F(x)) 2 ~ P(F(x)) 2 . (94) 
It remains to consider the case of overlapping trees. 

We defined F(x) in such a way that if F(x) holds, then x' together with the com- 
ponent of G — x' containing x forms a tree, in which x is the unique vertex at maximal 
distance from x' . If F(x) holds, so x' is in the 2-core, then x is the unique vertex of 
this tree at maximal distance from the 2-core. Let T x denote this tree, or, in general, 
the tree component containing x if we delete from G all edges lying in the 2-core. If 
F(x) and F(y) both hold, then from this uniqueness property, the trees T x and T y are 
disjoint, except possibly at x 1 and y': they are two different trees attached to the 2-core. 

Let B(x, y) be the event that F{x) n F(y) holds and the trees T x and T y are disjoint 
(except possibly at x' and y'), but the trees witnessing F(x) and F(y) overlap. From 
the remarks above, for x ^ y, 

F(x)nF(y) = A(x,y)UB(x,y). (95) 
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To bound F(B(x,y)), we first test whether F(x) (not F(xj) holds, in a way that first 
uncovers the tree T x . Roughly speaking, we would like to show that the number of trees 
T x hanging off the 2-core is well behaved (i.e., its second moment is not too large). Then 
we could say that the attachment points to the 2-core are uniformly distributed, so it's 
unlikely that there are two trees attached to close points. The problem is that we need 
independence to get the second moment bound, and we do not have this, as we can't 
tell in advance when we have reached the 2-core and should stop exploring the tree from 
x. To get around this, we choose a stopping vertex in advance. 

Given distinct vertices x and x, let F(x; x) be the event that F(x) holds, with the 
division vertex x' equal to x. Note that F(x) is the disjoint union of the events F(x; x), 
x S V{G) \ {x}, all of which are equally likely. Thus 

F(F(x; x)) = (n- l) _1 P(.F(x)) ~ n~ 1 P(F(x)). (96) 

Let T(x; x) be the event that x, together with the component of G — x containing x 
forms a tree consistent with F(x; x). Note that we do not insist that x is in fact in the 
2-core, and that if F(x; x) holds then T(x; x) must hold. 

Crucially, we may test whether T(x; x) holds by exploring the neighbourhoods of x 
in the usual way, except that if we reach x at some point, we do not test for edges from 
x to unseen vertices. (Since we require the relevant neighbourhood to be a tree, we do 
test for edges between all pairs of reached vertices.) Also, given T(x;x), we may test 
whether F(x; x) holds by continuing to explore from x; the property required of this 
further exploration is captured by C n D\ n D[ above (this was the reason for 'splitting 
off' from D2), and has probability essentially 0(e 2 ). More precisely, this probability 
p depends on q (i.e., on t = T±), and also on the distance t$ + r from x to x, as well as 
very slightly on the total number of vertices of T x . (Not too much, as our definitions 
ensure that \T X \ = o(n 2//3 ).) With r and q fixed, the calculations leading to our estimate 
of P(Fq) show that p ~ p T ^ where p T ^ q = 0(e 2 ) holds uniformly for |r|, |g| = 0(1/ e). 
However, we consider q in some range \q\ < M(n)/e, and r in the range |r| < 2M(n)/e, 
where M(n) — * 00. Let if] = ip(n) be a function tending to arbitrarily slowly (more 
slowly than the implicit function in the o(-) notation in (|86p). and let us write / = @(g) 
if f/g = tJ) ^. Taking M(n) to tend to infinity sufficiently slowly, we may assume that 
Pr,q = 0(e 2 )j so we have 

P(F(x;x) I T{x;x)) = 9(e 2 ) 
whenever T{x; x) holds. From (|96p it follows that for all x 7^ x we have 

P(T(x;x)) = e(e" 2 n- 1 P(F(x))) = e(^ 2 n- 2 ), 

recalling that ¥(F(x)) = @(n~ l \T) = ^(n- 1 ). 

Given x / y and x, y, let B'(x,y,x,y) be the event that T(x,x) n T(y,y) holds, 
with the trees T x and T y disjoint. Note that if this event holds, then x,y $L {x,y}. We 
may test whether B'(x, y, x, y) holds by exploring from x and y respectively (with the 
explorations modified at x and y), and the two explorations cannot 'help' each other. 
Arguing as for (|93p above, using Lemma [TBI it follows that 

P(B'(x, y, x, y)) ~ P(T(x, x))P(T(y, y)) = 6( £ - 4 n- 4 ) 

for all x, y £ {x, y}; the probability is if x or y G {x, y}. 
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Fix vertices x 7^ y, and let x and y be chosen independently and uniformly at random 
from V(G). Note that 

P(S / (x,y,x J y))=e(e- 4 n- 4 ). (97) 

Let us condition on B'(x,y,x,y). Moreover, we condition on V x = V(T X ) \ {x}, on 
V y = V(Ty) \ {y} and on the structure of the trees T x and T y , but not on x and y. Given 
this information, x and y are independent and uniform from V = V(G) \ (V x U V y ). 
Indeed, the given information says that certain trees T x and T y are attached to x,y 6 
V' . Each tree is equally likely to be attached to any vertex of V , so, given this, the 
attachment vertices are uniform on V . 

The event we have conditioned on does not depend on the edges in V' . Hence, the 
conditional distribution of Gr[V] is that of G = G(n' , A/n), where n' = n — \T X \ + 1 — 
\T y \ + 1. From the defintion of F qo , we have \T X \, \T y \ = o(n 2//3 ), so n' = n — o(n 2 / 3 ). 
The edge probabilities in G' are thus A'/n' where 

A' = Xn'/n = (1 + e)(n - o(n 2/3 ))/n = 1 + e - o(n" 1/3 ) = 1 + e' , 

with e' ~ e. 

Let B'(x,y) be the event that B(x,y) holds, and x = x', y = y', so 

F(B\x,y))=n' 2 F(B(x,y)). (98) 

If B'(x, y) holds, then so does B'(x, y, x, y). Furthermore, x and y must be in the 2-core 
of G, which is the same as the 2-core U of G' . Also, x and y must be close, i.e., within 
distance d = 2t x + 4M(n)/e ~ 2*i. 

From the remarks above, we may bound F(B'(x, y) \ B'(x, y, x, y)) by the conditional 
probability (given the trees T x , T y etc but not x, y) that x and y are close in U, and 
hence by 

\G'\~ 2 EM d (G') ~ n~ 2 KM d (G'), 

where M d (G') is the number of close pairs in U, and the expecation is over the random 
graph G' . 

Now G' has the distribution of G(n',A'/n'), with A' = 1 + e' and e' ~ e. Also, 
d ~ 2ti ~ 21ogw/e = log(e 3 n)/(3e) ~ 3 -1 log((e / )V)/e'. By Lemma El we thus 
have EMrf(G') = o(e 4 n 2 ). Taking our slowly growing function ifi(n) small enough, the 
expectation is smaller than e 4 n 2 by at least a factor e^, say. It follows that 

F(B'(x,y) I B'(x,y,x,y)) < @(e 4 e^). 

Using ([97D it follows that F(B'(x,y)) = din^e^) = o(n" 4 ), and hence, from (|98]l . 
P(S(x,y)) = o(n~ 2 ). 

Using (|95p . and recalling that A r denotes the number of vertices x such that F(x) 
holds, we have 

X y^ X 

using (fMj) . 
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Since EiV —* oo, it follows that EiV 2 ~ (EiV) 2 , and hence that N is concentrated 
about its mean. Since EN ~ EA^, and N and N are whp equal, we thus have A" 
concentrated about its mean also, where N is the number of x such that F(x) holds. 

Finally, the end of the proof is as in 

Section E2J Set t 2 = log(e 3 n/w 2 )/log A, let 
Ke — > oo very slowly, let A" be the number of vertices x for which F(x) holds, and let 
M the number of pairs x, y for which F(x) and F{y) hold disjointly, but d(x, y) < d, 
where 

log(e 3 ra) logw log(e 3 n/(j 2 ) ^. , 
d = 2(t + t 1 + q)+t 2 -K = 2 - n / ,^ +2 T A T + 8 \ ; + O 1 + g - if 

log(l/A*) log A log A 

log(e 3 n) log(e 3 n) 

= — j r~ + 2; j——- + 0{l)+q-K. 

log A log(l/Aj 

Given that F{x) and F{y) hold disjointly, the (to + h + ^-neighbourhoods of x and y 
each contain at most u)/e vertices. Exploring from x and y in the obvious way, the rest 
of the graph is 'unseen', and the expected number of paths of length at most t 2 — K 
joining one neighbourhood to the other is at most 

{ui/ef Y, n k -\\/nf=u> 2 e- 2 n- 1 £ X k /n = O^e^n^X^) = 0{X~ K ) = o(l). 

k<t 2 -K k<t 2 -K 

Hence, the conditional probability that d(x, y) < d is o(l), so E(M) = o(n{n—l)W(A(x, y))) = 
o((E JV) 2 ), using (|93p . It follows that whp there are at least EA^/2 > 2 vertices x for 
which F(x) holds, but at most (EiV) 2 /5 pairs of such vertices within distance d, so 
diam(G) > d whp. Recalling that both —qe > and Ke may be taken to tend to 
infinity arbitrarily slowly, this completes the proof of the lower bound in Theorem [3l 



4.7 The upper bound 

Throughout we fix a function e = e(n) satisfying e — ► and e 3 n — > oo. As before we 
shall often write A for e 3 n, and set 

w = A 1 / 6 . 

As before, let to = log(e 3 n)/ log(l/A*), t\ = logw/logA, and t 2 = log(e 3 n/i(j 2 )/log A; 
we ignore rounding to integers, which makes no essential difference in our calculations. 
Let K = K(n) be such that Ke — * oo, and let 

d = log(e 3 n)/ log A + 2 log(e 3 n)/ log(l/A*) = 2t + 2t x + t 2 , (99) 

so our aim is to prove that diam(G) < do + K holds whp, and we may assume if we like 
that Ke grows slower than any given function of n tending to infinity. The basic idea is 
to simply estimate the expected number of pairs x, y with d(x, y) > do + K. However, 
the calculations in the previous sections imply that on its own, this will not work; the 
expectation turns out to be roughly e -4 if Ke grows slowly. The reason is that, given 
that a tree hanging off the 2-core has height at least h, the expected number of vertices 
it contains at distance at least h from the 2-core is of order e~ 2 . 

To get around this, we need to impose a version of the wedge condition; we should 
like to consider only vertices x that are at maximal distance from the 2-core in their 
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tree. (Note that we cannot insist that x is the unique vertex at this distance in its tree, 
as we did before.) This suggests the weak wedge condition: roughly speaking, we should 
like any 'side branches' starting from Tt(x) to have height at most t, one more than the 
height allowed in the strong wedge condition. This is all very well if the neighbourhoods 
of x out to the relevant distance form a tree, but in the upper bound we must consider 
all vertices x, so we must modify the condition. Unfortunately most of the work in this 
section we be needed to show that we can rule out various unlikely cases (such as the 
diameter coming from a pair x, y where x is close to a short cycle). 

Suppose that x and y are a pair of vertices at maximal distance, pick any t < d{x, y), 
and consider any shortest path P from x to y. Then, tracing P backwards from y to x, 
we first meet G<t(x) at some vertex vt S T t (x). Since P is shortest, d(vt, y) = d(x, y) — t, 
so continuing from vt to x along the unique path in G< t (x) joining these vertices, we find 
another shortest path P' from x to y that starts with vqV\V2 ■ ■ ■ Vf, a path in G< t (x). We 
shall split the tree G° <t {x) into the trunk T, consisting of all vertices with descendants 
in Tt(x), plus one side branch B v for each v G T. Here B v consists of v together with 
all its descendants in G< t (x) that are not descendants of another trunk vertex. (This 
corresponds roughly to the decomposition of X\ into together with independent 
copies of £a*> the difference is that we only consider finitely many generations, as we 
must in the graph.) 

Of course each Vi is a trunk vertex. The key observation is that for < i < t, the 
side branch B Vi is either short, i.e., has height at most i, or is reattached, i.e., B Vi — 
meets an edge of G<t(x) \ G° <t (x). Otherwise, let w be a vertex of B v . at maximum 
distance from w, in B Vi . Since B Vi is not reattached, any path from w to y must pass via 
Vi. Since B Vi is not short, the total length of such a path exceeds d(x,y), contradicting 
the assumption that x and y are at maximum distance. 

Given 1 < d < t and a vertex x, let S denote the set of vertices of T^x) that 
have one or more descendants in Tt(x), in the tree G< t (x). We say that x is (d,t)- 
acceptable if there is a vertex v 6 S such that every side branch in G% t {x) of the 
path x = vqV\ ■ ■ ■ Vd = v is either short or reattached. From the observation above, 
if x and y are at maximal distance, then x and y must be (d, i)-acceptable for any 
1 < d < t < d(x,y). 

Set h = e _1 log log A, say. (Here e _1 times any slowly-enough growing function will 
do.) For t > h, let A t = A t (x) be the event that x is (h, t)-acceptable, and let B t = B t (x) 
be the event that < |r r (x)| < tu/e holds for < r < t. The following lemma will play 
a key role in our estimates. 

Lemma 37. Under the assumptions of Theorem we have 

F(A t n Bt) < (1 + o(l))4 7o e 3 X{- h (100) 

uniformly in all t\ + 3h < t < 10e _1 log A, where 70 > is the constant appearing in 
LemmaV3l\ 

Recall that, by the second part of Theorem 1291 in the branching process X\ we have 

P(0 < \X r \ < u/e, r = . . . t) ~ 4eA^ tl (101) 

for any t < 10e _1 logA such that e(t — t\) — > 00; by Lemma [T71 this carries over to 
the graph. Thus Lemma [33 says essentially that the conditional probability that our 
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modified wedge condition holds is asymptotically 7o£ 2 - We postpone the proof of the 
lemma for the moment. 

Unfortunately, to handle the case when A = e 3 n grows slowly, it turns out that we 
need two further lemmas. The first is a very simple observation; once one thinks of the 
lemma, it is very easy to prove. We thought of it after seeing the preprint of Ding, Kim, 
Lubetzky and Peres |17j . 

Lemma 38. Let L = L(n) be any function satisfying L = o(l/e). Then, under the 
conditions of Theorem^ whp the giant component of G(n, X/n) contains no cycle of 
length at most L. 

Proof. Fix 3 < t < L and a sequence v\, . . . ,vg of distinct vertices of G = G(n, (1 + 
s)/n). Let E be the event that this sequence forms a cycle, i.e., that the edges v\V2, 
V2V3, . . . ,V(V\ are all present, so F(E) = (1 + e) jn ~ n~ £ . Let F be the event that 
E holds and this cycle is in the giant component. First testing whether E holds, and 
then exploring outwards from this cycle, by comparison with the branching process as 
usual we see that ¥(F \ E) = 0(£s) = 0(e£), with the implicit constant universal. 
Hence P(i ? ) = 0(e£n - ^). Summing over all at most n sequences, and dividing by 11 
to avoid overcounting, the expected number of ^-cycles in the giant component is thus 
0{e). Finally summing over I < L and using Markov's inequality gives the result. □ 

Lemma 39. Let ip = ip(n) be some function of n tending to infinity slowly, with ip = 
0(A 1 ^ S ) andvjj = o(e -1 / 10 ). LetA*(x) denote the event that t w / £ (x) is defined, x is (d,t)- 
acceptable for all 1 < d < t < t UJ / £ (x), and G<t(x) is a tree for t = mm{t LU / £ (x), e -1 /tp}. 
Under the assumptions of Theorem\3\we have 

F{A*(x)) = 0(e 3 ^ 8 ) = 0(e 3 A). 

(It is likely that the probability estimated above is 0(e 3 ), at least if the quantity 
e^ 1 /ip in the definition of t is replaced by a small constant times e , but even a bound 
such as O(A 100 e 3 ) would be more than enough for us here.) 

Assuming Lemmas 1371 and 1391 for the moment, it is not hard to complete the proof 
of Theorem [3] calculating as in Section [21 by summing the expected number of pairs x, 
y with tu/e i n certain ranges and both having acceptable neighbourhoods. 

Proof of Theorem^ Let do be defined by (|99p . and let K = K{n) be such that Ke — > 00 
and K < e^ 1 log log A, say. Our aim is to show that whp there is no pair (x,y) of 
vertices in the same component with d(x, y) > do+K. In the light of Luczak's bound (|6|) 
from [26j, and a standard duality argument, we need only consider the giant component. 

Let us say that a vertex x is tree-like if G <£ -i i^(x) is a tree. By Lemma [381 wri P 
every vertex in the giant component is tree-like, so it suffices to consider pairs (x, y) in 
which both x and y have this property. 

As noted above, in any pair (x,y) at maximal distance greater than do, both x and 
y must be (d, t)-acceptable for any d < t < d^. Set 

t + = t + ti + K/3, 

noting that t + < do/2 and t + > h = e^ 1 log log A. By Lemma \37\ for any vertex x we 
have 

F(A t+ (x)nB t+ (x)) < (4 + o(l)) 7o e 3 Af-' 1 =0(n~ 1 Af /3 ) = o(n- 1 ), 
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so whp there is no vertex for which this event holds. Let A'{x) be the event that t u / £ {x) 
is defined and at most t + , and A*(x) holds, where A*{x) is defined in Lemma [39l Let 
us call (x, y) a regular far pair if d(x, y) > do + K, and the events A'{x) and A'(y) hold. 
Then from the comments above it suffices to prove that whp there are no regular far 
pairs. 

We may test whether A'{x) holds by uncovering successive neighbourhoods of x, 
stopping at the first (if there is one) with at least ujje vertices, and then testing for 
acceptability and the tree condition, or stopping after t + steps if there is no such neigh- 
bourhood (in which case A'{x) does not hold). By definition, each neighbourhood other 
than the last has at most uj/e vertices. By (the graph version of) Lemma l28| the 
probability that we find more than 2co/e vertices in the last neighbourhood is at most 
exp(— £l(u/e)) = exp(— ri(e _1 / 2 n 1 / 6 )) = o(ra~ 100 ). Ignoring this event, testing A'(x) 
involves uncovering 

0{t + cu/e) = 0(iulogA/e 2 ) = 0(A 1/3 £- 2 ) = 0(A 1/3 A~ 2 / 3 n 2 / 3 ) = o(n 2 / 3 ) 

vertices. Also, we uncover 0{uj/e) = o(n 1 / 3 ) vertices in each generation. Noting that 
e{t + oj/e) 2 = 0(A 2 / 3 e~ 3 ) = 0(nA -1 / 3 ) = o(n), Lemmas [T71 and [THl apply to the corre- 
sponding trees. By Lemma [18] it follows that for x and y distinct, 

F(A'(x) n A'(y) n {d(x, y) > t u/e (x) + t u/e (y)}) 

= (1 + o{l))F(A'(x))¥(A'(y)) + o(n~ wo ) = 0(A 2 e 6 ), (102) 

using ¥(A'(x)) < ¥(A*(x)) and Lemma [391 for the final bound. (In fact, we have glossed 
over something here: using Lemma [18] shows that the events that the explorations 
from x and y give certain trees consistent with A'{x) and A'{y) are asymptotically 
independent. However, the events A'(z), z = x,y, depend not just on the trees, but 
also on any additional edges present inside the relevant sets T t {z). Since these are 
present independently with probability A/n, asymptotic independence of the trees gives 
asymptotic independence of the entire neighbourhoods.) 

Suppose we have explored the neighbourhoods of x and y and found that the event 
described above holds, i.e., A'(x) and A'(y) hold disjointly. Then Lemma [T5l applies, and 
the conditional probability that the explorations do not meet within ti + 2e~ 1 log log A 
further steps is exp(-(l + o(l))(log A) 2+ °W) + 0(A -1 °) = 0(A -10 ). Summing over 
choices for x and y, we see that the expected number of regular far pairs with d(x, y) > 
*w/eO»0 + tw/e{y) +t 2 + 2e _1 log log A is 0(n 2 A 2 £ 6 A~ 10 ) = 0(A~ 6 ) = o(l). Hence, whp 
there are no such pairs. 

Set 

t~ =t + ii-2e~ 1 loglogA, 
noting that whp every vertex x in a regular far pair satisfies 

t u / e (x) > d + K- (t 2 + 2s- 1 log log A) -t + > to + ti- 2s- 1 log log A = t~ . (103) 

This value is large enough that Lemma l37l applies. 

(Let us remark that if A > (log n) 20 , say, then the argument above simplifies: we may 
replace 2e -1 log log A by 2e _1 loglogn, and the error probability given by Lemma [T5l 
is then o(n -100 ) (using the middle expression in (|60p ). so there is no need to check 



72 



acceptability to conclude the equivalent of (|103j) . In particular, there is no need for 
Lemma [39] in this case at all.) 

For distinct vertices x and y and integers t~ <t,t'<t + , let 

E x , y ,t,t> = A [t] (x) n {t u/e (x) = t}n A m ( y ) n {t^/M = t'} n {d(x, y ) >d + k}, 

where [t] denotes the largest multiple of that is strictly smaller than t. From the 

comments above, to prove that diam(G) < do+K holds whp it suffices to prove that whp 
none of the events E xy ^ t i holds. (Here we may impose whatever acceptability conditions 
we like: the reason for choosing exactly A t i(x) will become clear in a moment.) 

Using Lemma [TBI as above, the probability that E\ = Am(x) n {^/ e (x) = i] and 
Ei = Aun(y) n {t u j £ {y) = t'} hold with disjoint witnesses is asymptotically F(Ei)F(E2). 
Noting that do + K — t — t' — £2 is within 5e _1 log log A = 0(^2) of and is hence at least 
— £2/2, given that E\ and E2 hold disjointly, Lemma [15] tells us that the probability that 
d(x, y) > d + K is exp(-(l + o(l))A <fe+Jif - t - t '- ta ) + 0(A" 10 ). 

Let U = U x ^ y ,t-<t,t><t+ E x,y,t,t'- Then, writing A < B for A < (1 + o(l))B, 

t+ t+ 

nu) <« 2 EE H A [t](x) n {t u/e {x) = t})¥(A [tl] (x) n {t u/B (x) = t'}) 
t=t- t'=t~ 

(exp(-(l + o(l))A do+ ^-*-*'~ i2 ) + 0(A- 10 )) . 

Grouping the sums into blocks of size k = [l/e\ , and noting that when r is a multiple 
of k we have 

r+k 

P(^[t] (x) n K /£ (x) = t}) = ¥(A r (x) D{r< t u/e (x) <r + k})< P(A r (x) D B r (x)), 

t=r+l 

we have 

HU)<n 2 J2 E nA t (x)nB t (x))F(A t r(x)nB t ,(x)) 

t~-k<t<t+ t--k<t'<t+ 

(exp(-(l + o(l))A d()+ ^-*- t '- t2 - 2fc ) +0(A- 10 )) , 
where primes denote sums that run over multiples of k. From Lemma [37] we thus have 

m) < n 2 £ E 167o^ 6 Ai + ^ 2tl (exp(-(l + (l))A^+^- i '-* 2 - 2fc )+0(A- 
t-<t<t+ t-<t'<t+ 

= (l)+n 2 E 167oe 6 Al +t '- 2tl exp(-(l + (l))A do+A '- t - t '- t2 - 2fc ), 

t~<t<t+ t-<t'<t+ 

since there are at most et + = O(logA) terms in each sum, so the contribution of the 
0(A~ 10 ) term is at most 16n 2 7 ^ 6 (log A) 2 0(A~ 10 ) = 0(A~ 8 (log A) 2 ) = o(l). 

Taking the final term in the two sums above, we have t and tf at least t + — k, so the 
exponent of A above is at least do + K — 2t + — ti — 4k = K/3 — 4k, which is at least 
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K/4 if n is large. Hence the exponential term above is always at most exp(— A^/ 4 /2), 
say. Taking the final term in each sum, the corresponding A*' term is at most 

^2t+-2fc-2ti = ^2t () +2^/3-2fc < ^2t _ £ -6 n ~2 

As t + t' decreases from its maximum possible value in steps of k, the exponent of A in 
the exponential increases by k ~ 1/e ~ 1/logA, so the A'" term increases by a factor 
that is asymptotically e and certainly at least 2. The term increases by a factor of 
X~ k which is asymptotically e and certainly at most 3. Also, after r steps, there are at 
most r + 1 ways of realizing a given sum t + t' . It follows that 

oo 

< o(l) + J2 16 7o V + l)3 r exp(-A^ 4 2 r - 1 ), 

say. Since \ K / A — > oo, the exponential term in the final sum decreases extremely rapidly, 
and the whole sum is dominated by its first term, which is o(l). This completes the 
proof of Theorem [3j assuming Lemmas [37] and [39] □ 

Let us note for later, when we come to consider the distribution of the diameter, that 
if we modify the definition of E x ^ y ^ji by replacing d(x, y) > do + K by d(x, y) > do — K, 
then we obtain 

oo 

F(C/) < 16 ^o( r + i) 3 " exp(-A /f / 4 - 2 ^2 r - 1 ). 

r=0 

Indeed, everything is as before except that the exponent of A has decreased by 2K. 
Now this new sum is large, but the contribution from terms with r > log(A 3 ' fs ')/log2 ~ 
3Ke/ log 2 is still small. Hence, the sum from terms in which one of t, t' is smaller than 
t + by more than 3[l/s\Ke/ log 2 ~ 3K/ log 2 < 5K is small. Since diam(G) > do — K 
whp, it follows that whp the diameter is realized by vertices x and y which form a 
regular far pair in which each vertex z has to + ti — 5K < t UJ / £ (z) < t + = to + 1\ + K/3. 
Since Ke may be taken to tend to infinity arbitrarily slowly, this says that for a given 
error probability, it suffices to consider t w / e (z) = to + t\ + 0(l/e). 
It remains to prove Lemmas [37] and [39l 

Proof of Lemma [37| Recall that t\+3h < t < 10e _1 log A, and set d = h = e^ 1 log log A < 
t/2. Let A = A(x) denote the event that x is (d, t)-acceptable, and B t = B t (x) the event 
that < |r r (x)| < uj/e holds for < r < t. Our aim is to bound the probability of 
An B t ; note that this event depends only on G<t(x). 

To avoid dependence, we'd like to work with the branching process rather than the 
graph, but we cannot assume that the relevant neighbourhoods of x are trees. So let 
us model the pair (G< t (x), G<t(x)) by a pair (T*,G*) as follows: first construct the 
branching process (A r )o< r <t- Let T* be the corresponding rooted tree of height at 
most t. Given T* , i.e., given (X r ), form G* by starting with T* and joining every pair of 
vertices in the same set X r with probability A/n, independently of all other pairs. It is 
clear that the conditional distribution of G* given T* is the same as that of G<t(x) given 
G%_ t (x). If (To, Go) is any possible value of (G< t (x), G<t(x)) consistent with AnB t , then 
since B t holds, T is a tree to which Lemma [T7J applies. So ¥(G% t {x) ^ T ) ~ P(T* ^ 
T ). It follows that P(G< t (x) ^ G ) ~ P(G* ^ G ). Hence, P((T*,G*) e A n B t ) is 
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asymptotically equal to the probability that (G< t (x),G<t(x)) £ A f] B t . From now on 
we consider the model (T*,G*), forgetting about the graph G(n,p). 

For technical reasons we modify G* slightly as follows: within each set X r , we only 
test for edges between the first ui/e vertices in some auxilliary random order. This does 
not affect the probability of A H Bf, since when Bt holds (which is determined by T*), 
the distribution of G* given T* is unchanged. 

Let S be the set of particles in Xd with descendants in X t . To achieve independence 
between A and Bt, let us weaken B = Bt to B' = B' t , the condition that for every v £ S, 
the number of descendants of v in each X r , d < r < t, is at most u/e. Our aim is to 
bound F(A n B) by P(^4 n B'); to evaluate the latter we estimate P(jB') and | B'). 

Our first aim is to show that 

F(B') ~ F(B) = p ~ 4eAl- tl , (104) 

where the final esimate is from (jlOip . Recall that d = h. Note that F(B' \ \S\ = s) = p s , 
where p is the (unconditional) probability that < \X r \ < u/e holds for < r < t — h. 
From Theorem I29|, we have p ~ 4eA^~' l " tl ~ po\~ h . Also, since t > t\ + 3/i, we have 
p < (1 + o(l))Xl h . Since eh -> oo it follows that p 2 < (1 + o(l))A^ = o(po)- 
Since B G B', we have 

P(B n {151 > 2}) < P(B' n {|5| > 2}) < P(5' | {|5| > 2}) < p 2 = o(¥(B)). 

Recalling that if B or B' holds then \S\ > 1, to show that P(B') ~ F(B) it suffices to 
show that F(B n \S\ = 1) ~ F(B' f] \S\ = 1) = p. Let B" be a strengthened version 
of B' , where we replace the upper bound u/e by o//e, with (J = (1 — l/logA)u; ~ w. 
Applying Theorem 1291 again with this new value of a/, we find that F(B" \ \S\ = 1) ~ p. 
But given that |»S| = 1 and B" holds, B certainly holds as long as the tree T formed by 
the descendants of the root that are not descendants of the unique particle in S contains 
at most u/(e\ogA) > A 1 / 10 /e particles in each generation. 

The distribution of T is dominated by that of the tree T 1 formed by starting one copy 
of Xa* m each generation < t < h. (In T these copies are conditioned to die by a specific 
time.) The first h—1 generations of T' have exactly the distribution of the process (Dt) 
studied in Lemma [321 Hence, by the second part of that lemma, the probability that 
one of the first h generations of T exceeds size A^/e is 0(ehe~ n( - Al/2 °y) = o(l). From 
generation h onwards, the tree T evolves as a subcritical branching process, and from 
a standard martingale argument the probability that any later generation exceeds the 
size of generation h by a factor of A 1 / 20 is at most 1/A 1 / 20 = o(l). Thus we do indeed 
have F(B | B" n |5| = 1) ~ 1, and it follows that F(B') ~ F(B), as claimed. 

Recalling that p 2 = o(F(B)) and hence p 2 = o(F(B')), for r > 2 we have 

P(|5| = r\B')< F{B' | \S\ = r)/F{B') = o{p r - 2 ). 
Summing, it follows that 

E(|5| | B') ~ 1. (105) 
We claim that (in the modified G* model) 

F(A | B', \S\ =N)<(1 + o(l)) 7o iVe 2 . (106) 
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Using F(A | B') = £jv>i p (|S| = N \ B')F(A | B', \S\ = N), and CEHD and (HMD, we 
obtain the required bound on F(A n B) < F(A n B'). 

It remains only to prove (|106p . Recall that we are working with the model (T*, G*). 
Let us construct T* (which is simply the first t generations of X\) by decomposing it 
into the trunk and side branches exactly as in the graph. Thus the trunk consists of the 
subtree T" of T* consisting of all particles with descendants in Xt- Then T* may be 
formed by adding for each v in generation r, < r < t, ofT'a copy W v of the process 
(X t ')o<t'<t-r conditioned on X[_ r being empty. We may think of W v as the subcritical 
process X\ t conditioned on dying out by a particular time. 

Now whether B 1 holds is determined by T 1 together with the trees W v for vertices 
v in sets X r , r > h. Let us condition on T' and these trees W v \ the only remaining 
randomness is in the W v for v G X r , r < h. 

Let v be one of the N = \S\ vertices in S, and let x = VqV\ ■ ■ ■ % = v be the path to 
v. Let Wi = W Vi , for < i < h — 1. Let A v be the event that every Wi is either short 
or, when we come to G*, reattached. Note that A holds if and only if one of the events 
A v holds, so it suffices to prove that the conditional probability of A v is (1 + o(l))7o£ 2 . 
Since the different W w are independent given T' , the conditional distribution of each Wi 
(given T' and the W w , w G X r , r > h) is just the unconditioned distribution. Writing, 
as before, Sj = P(|X~| > 0) and di = 1 — Sj, the probability that Wi is tall (not short) 
is just 

,;l=0) 

-0(8t-i) = 8 i+1 -0(a d ), (107) 
Let w be the number of tall Wi. 

oo 

i=i 

since = exp(— (1 + o{l))eh) and eh — > oo, so heX'l = oil). It thus suffices to show 
that F(A V ) < (1 + o(l))F(w = 0); then ([106]) follows by the union bound. In other 
words, we must show that A v n {w > 0} is much less likely than w = 0. 

Let / be any subset of {0,1,2,... , h — 1} with |/| > 1, and let us condition on 
precisely the corresponding trees W, : i £ I being tall. Let Mj, i G /, be the number of 
vertices in each tall tree Wi, noting that these numbers are conditionally independent. 
Given that a particular W% is tall, its average size is at most that of conditioned to 
survive to height i + 1 (we also condition on dying out by height t — i). By Lemma [331 
this is at most (i + 2)/e. 

Let us now go through the tall trees in order, checking to see whether each is reat- 
tached. (We will be forced to skip some; see below.) Due to the way we modified G* , 
when checking if Wi is reattached, for each vertex u of W% we need only check for edges 
of G* \ T* between u and the first (in a random order) uj/e vertices of the set X r that 
u lies in, or all vertices of X r if there are fewer than uj/e. For each u, the probability of 
finding such an edge is at most p = (uj/e)\/n < 2tue~ 1 n~ 1 . The probability that the tall 



Pi = n\X l+ i\ > | |X t _i| = 0) = F(\Xr +1 \ > | \X~ 
dt-i — di+i Si + i — st-i 

= J = A = Si+1 

O't-i u>t-i 

since t > 2d and i < d. 

Now d = h> 1/e, so (by LemmaEII s d = O(eAj). 
Then from the estimate above and Lemma [3X1 

h-l h 

F( w = o) = mi -pi) = ex P (o(^))n(i - 

i=0 1=1 
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tree Wi reattaches is thus at most E(Mjp) = E(Mj)p < (i + 2)e ^ < (i + 2)a;e '';> 



' I] , ll'd! lilllll? » lUUil (11 1J1UM Al!\j.\J.,IJ) LL:\ jM j ! / / \ (/ f -I- // ^ I' T -i/a.'.. 

say. 



When testing whether the first tall tree does reattach, we stop if we find one edge 
witnessing this. This edge may spoil a later tall Wi by going to a vertex of that W{. 
It's not hard to see that this is more likely for a larger later Wi, so the conditional 
expectation of Mi given that the first tall tree reattached but did not spoil Wi is at 
most E(Mj). Continuing, we can crudely bound the conditional probability (given /) 
that all tall trees reattach by 

J jdJ 

where the sum runs over all subsets of I containing at least half of the first k elements 
of / for every k < \I\, corresponding to the fact that we test trees in order, and each 
spoils at most one later one. (In fact, we can replace the sum by a maximum, but this 
makes no difference.) Suppose that |/| = 2k — 1 or |7| = 2k, and list the elements of 
/ as i±, i2, ■ ■ ■ in order. There are at most 4 fc terms in the sum, and the largest has 
J = is, is, . . . , i2k-i}, so given I, the probability of reattachment is at most 

4(zi + 2)uj 4(i 3 + 2)lo A(i k + 2)u 
e 2 n e 2 n e 2 n 

Now the probability that exactly the trees indexed by I are tall is exactly 
F(w = 0) Yl < F(w = 0) Yl 3pi < P(u> = 0) II + 2 )' 

say, noting that p, L < Si + ± and using the crude upper bound 3/(i + 2) for Sj+i. Summing 
over I with |/| > we find that 

tl 1N , tt 10 10 4(»! + 2)uj 4(i 3 + 2)uj 

k ii<i2<---<ik<d 

The sum over even k may be crudely bounded by J2T=i ^ j where 

^ 10 10 4(a + 2)u; r^., , 400 w A , , i 
5= > o <> (b-1). =- < 400da;e- 2 n" 1 . 

0<a<b<d fe<d 

Since d < e -1 log log A, we have S = o(l). Bounding the sum over odd k similarly, it 
follows that P(A V n {w > 0}) = o(P(io = 0)), as required. □ 

Finally, we prove Lemma 1391 

Proof of Lemma\3M Throughout this proof, let K = [log(l/e)] and, for 1 < k < K, let 
tfc = e~ l /(kip). (We ignore the irrelevant rounding to integers, noting that tx — ► oo.) 

For 2 < k < K let Ek(x) denote the event that tk < t UJ / £ (x) < tk-i, the neigh- 
bourhoods of x to distance t UJ / £ (x) form a tree, and x is (ifc/2, ^-acceptable. Let 
E'i(x) denote the event that 5^ holds (i.e., < |r t (x)| < uo/e for < i < t\), that 
G<n(x) is a tree, and x is (ii/2, ti)-acceptable. Finally, let E 00 (x) denote the event 
that t^u^x) < tj<. Splitting into cases according to the value of t w / E (x), we see that if 
A*(x) holds, then so does one of the events E\(x), E^x) or Ek(x), 2 < k < K. 
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Let us start with a simple branching process observation related to that in Lemma [32l 
writing t u / e for min{t : \X t \ > u/e}, as before, whenever this is defined. Suppose we have 
chosen some t > 1 in advance. If we explore the branching process step by step and find 
a generation X r , r < t, with size at least uj/e, then it is easy to see that the conditional 
probability that \X t \ > u/e is at least 1/10, say. Thus F(\X t \ > u/e) > F(t u/e < t)/W, 
and hence 

P(t w / e < t) < 10P(\X t \ > u/e). (108) 
Using this observation and Lemma [30l we see that 

P(< w / e < t K ) < lOfjfe-™' 1 '* 1 / 20 = Wt^e-"^/ 20 = o(e 3 ), 

since uip — ► oo while K > log(l/e). Comparing the graph and branching process as 
usual, it follows that F(E OQ (x)) = o(e 3 ). 

Turning to E k (x) for 1 < k < K, note that we may test whether this event holds by 
exploring at most t\ = 0(1/ e) steps from x, stopping if we reach a neighbourhood of 
size u/e. Arguing as above (|102l) . Lemma [TTl thus gives F(E k (x)) = (1 + o(l))¥(E] t ) + 
O(n _10 °), where E^ is the branching process event corresponding to Ek(x). It thus 
suffices to show that 

K 

^P(£,) = 0( £ V). (109) 

k=l 

This statement involves only the branching process 3L\, so from now on we work with 
this rather than the graph. 

Let Af- be the event that the branching process satisfies the condition corresponding 
to (ifc/2, tfc)-acceptibility. To simplify the arguments, for 2 < k < K let E' k be the event 
that Ah holds and > u/e. Only the last condition involves generations beyond 

t k , so arguing as for (fT0H|) we have F(E' k | E k ) > 1/10, and hence F(E k ) < 10P(f^). 
Also, let E[ be the event that A\ holds and < \Xt L \ < u/e. Then E[ D E\. Hence 

F(E k ) < WF(E' k ) (110) 

for all 1 < jfe < K. 

For k < 2 let L k be the event that |At fe _J > u/e; let L\ be the event that \Xi x \ > 0, 
so E' k = A k n Lk- As before, let T be the trunk of £\ defined up to generation t k , so T 
is the random tree consisting of all particles with descendants in Xt k . If we condition 
on the first t k generations of £\, then the conditional probability of L k depends only on 
|XiJ. Since knowing the trunk T determines |AjJ, we thus have 

F(E' k ) = F(A k nL k ) = Y, F ( T = T')F{A k \ T = T')F{L k \T = T'), 

T> 

where the sum runs over all possible trunks T' . Note that we may assume T' is non- 
empty, i.e., \Xt k \ > 0, as otherwise L k cannot hold. 

As before, given the trunk, we may reconstruct X<t k by adding independent random 
branches to each trunk vertex, with each branch a copy of conditioned to die by 
(absolute, not relative) time t k . Let S be the set of trunk vertices in generation t k /2, 
and N = \S\ the number of such vertices, so ./V is random but depends only on T. Since 
we are considering the branching process, which is by definition a tree, the acceptibility 
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condition A k holds if and only if some v £ S has the property that the side branch 
started at each V{ has height at most i for all < i < t k /2, where v^v\V2 ■ ■ ■ v tk /2 = v 
is the chain of ancestors of v. For a given v, the probability of this event is exactly 

^ /2 

rii=o (1 ~ Pi)i where pj is given by (|107p with t = t\, and d = tfc/2. (The argument is 
as for (|107p . It follows easily from the estimates in Lemma [3T1 that s& = s tk /2 = ^{t^ 1 ) 
and that 

n(i-^)=e(i)n(i-, 4 )=o(^ 2 )- 

i=0 i=l 



So far we considered a single v £ S; by the union bound it follows that P(A k \ T 
k 



T") < Ct k 2 N(T') for some absolute constant C. Hence 



P(J5fc) < Ct k 2 P ( T = T')N{T')¥(L k \ T = T'). (Ill) 

T' 

Let no = no (A;) = (kt/j) 5 . Let u^T and //^" denote respectively the contributions to the 
sum in (TlTTT) from trunks V with N(T') < n and N(T') > n , so P(££) < n k + ^ . 
Trivially we have 

< Ctfe 2 n J]P(T = T')F(L k \T = T') = Ct k 2 n P(L k ). 

T' 

For k = 1 we have P(Za) = P(|-^i I > 0)- Writing 5 for the event that the whole process 
survives, we have 

P(|X t | > 0) = s + (1 - s)F(\X t \ > | S c ) = s + (1 - s)P(|X t -| > 0). 

By Lemma [3"T| it follows that for t = o(l/e) we have 

P(|Xi| > 0) ~ 2/e. (112) 

In particular, f(L{) = 0(l/t\), so 

^ < Ct^ 3 n = 0(e 3 ^ 3 ^) = 0(e 3 tp 8 ). 
For k > 2, from (|108p and Lemma (|30p. we have 

P(L fc ) < 10P(|X tfc _J >w/e) < Wt-^e-^-W 20 , 
so ^ < WCt^t^e-^- 1 ^ 20 . Recalling that t k = e' 1 /^), it follows that 

fc=2 fc>2 

Since cj and ip are large for n large, the first term dominates, and this sum is o(e 3 ). 
Together with the bound for fi^ above this gives 

K 

Y J H=0{e^). (113) 

k=l 
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It remains to bound fxt- Noting that N(N — 1) > n^N whenever N > rig, and that 
P(L fc | T = T') < 1, we have 

4 < Ct k 2 Y,nT = T')n Q l N(T')(N(T') - 1) = Ctfn 1 E{N(N - 1)), 

T' 

where the final expectation is unconditional. Given X tk / 2 , each particle in this gener- 
ation survives to generation independently with probability p = P(|X 4fe / 2 | > 0) = 
0{t^ x ), from (fm]). Hence 

E(N(N - 1)) =p 2 E(\X tk/2 \(\X tk/2 \ - 1)) = 0(tf)E{\X tk/2 \(\X tk/2 \ - 1)). 

A simple inductive formula, or a tree counting argument, gives E(|X t |(|X t | — 1)) = 
A*(A+A 2 + ---+A*) <t\ 2t . Witht = t fc /2< l/s, this is 0(t k ), so E(JV(JV-1)) = O^ 1 ). 
Hence, 

^+ = O(^ 3 no x ) = 0(e 3 ^ 3 A: 3 (^)~ 5 ) = 0(e 3 A;- 2 ). 

Thus EfcLi/^fc = °( e3 )- Recalling that P(£^.) < /x" + and using (fTT3]) and (fTTOD . 
this establishes (|109p . As noted earlier, the lemma follows. □ 

Remark 5. As noted earlier, in the first draft of this paper we needed the condition 
A > e( log . The changes that allowed us to eliminate this are the introduction of 
Lemma [38] (making checking for acceptibility in the case when i tJ / £ (x) = o(l/e) much 
simpler), the modification of Lemma [39] to include the tree condition, and the new proof 
of Lemma [39] above. 

5 The distribution of the correction term 

In this section we shall describe the limiting distribution of the correction term in 
Theorem [3] and, very briefly, that in Theorem [T] Surprisingly, although Theorem [3] 
is much harder to prove than Theorem [1] the study of the correction term is much 
easier in the former case. Indeed, with p = X/n and A constant, even the description of 
the correction term is rather complicated. Let us start with the simpler case, assuming 
A = 1 + e with e = e(n) — > do. It turns out that given the results of the previous section, 
not much extra work is needed to obtain the distribution. Essentially, only one natural 
extra idea is needed. Since the formal details would take some time to write out, we 
shall only sketch the arguments. 

In Subsection 14.61 we obtained a lower bound on the diameter by considering ver- 
tices x with a certain property F = F q , q = qo, depending on the t = to + ti + q 
neighbourhoods, where \qe\ < M was essentially bounded. (We shall repeatedly use the 
observation that if some probability is o(l) uniformly in qe < M for any constant M, 
then it is o(l) if M = M(n) tends to infinity slowly enough. It is often easier to think 
of M as constant, although in the end we need M — > oo.) 

One aspect of this property F q , or rather of the related property F q , was that in 
the tree T x containing x and attached to the 2-core, x is the unique vertex at maximal 
distance from the 2-core. It turns out that a positive fraction of the trees attached to 
the 2-core have more than one vertex at maximal distance, and to obtain a precise result 
we must also consider such trees. But we must only count each tree once. The solution 
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is very natural: we consider an auxilliary random order -< on V(G), and consider only 
vertices x such that, writing S x for the set of vertices of T x at maximal distance from 
the 2-core, x is the first vertex of S x in the order -<. 

More precisely, we modify the definition of the branching process events E q and F q , 
by weakening the 'strong wedge condition' B(i) on page [62} instead of insisting that 
the 'side branch' starting at generation % dies within i generations, we insist that it dies 
within i + 1 generations (this is the weak wedge condition), and also, writing Si for the 
set of particles in the ith generation of the ith side branch, letting S be the union of the 
sets Si together with the initial particle, and taking a random order on S, we insist that 
the initial particle comes first in this order; we call this the medium wedge condition. 

We showed that the probability of the strong wedge condition was asymptotically 
diYY^ 1 di ~ di7o£ 2 ~ e -1 7o£ 2 , where di = F(\Xi\~ = 0) = 1 — Sj is the probability 
that the subcritical process dies by time i. Similarly, the probability of the weak wedge 
condition is asymptotically ^* ~ 7o£ 2 - 

If we condition on the weak wedge condition, then the distribution of S depends on 
e. However, the conditional probability that Si is non-empty is bounded by 

P(I*TI > o I = o) = i-p(I*TI = o I \ x f + i\ = °) = i-A = - 

From (|76p we have Sj < 2/i for all i > 1, so Yli^i^i) converges uniformly as e — > 0. 
Hence, for any M{n) — > oo, the probability that any Si, i > M, is non-empty tends to 0. 
For fixed i, the distribution of Si converges as e — * 0, in fact, to the distribution of the size 
of the ith generation of the exactly critical process Xi given that the (i + l)st generation 
is empty. It follows that, in the branching process, |S| converges in distribution to some 
random variable R not depending on e. Modifying the arguments in Subsection 14.61 we 
find that when we replace the strong wedge condition by the medium wedge condition, 
in place of (f92|) we obtain the estimate 

F(F q (x)) ~ F(F q ) ~ 47in- x A« (114) 

uniformly in \q\ < M/e, where 71 = E(l/i?)7o, and 70 is the constant in Lemma [3T1 

Turning to the upper bound, after much work mostly involving ruling out patholog- 
ical cases, we showed in Subsection 14.71 that for any function M(n) tending to infinity, 
whp any vertex x that is part of a pair (x, y) at maximal distance satisfies the property 
B*(x) = \t w u{x) — to — ii| < M/e, together with a certain unpleasant 'acceptability' 
condition A*{x). Moreover, Lemma [37] shows that the expected number of such vertices 
is bounded by some function of M . Thinking of M as constant for the moment, this 
expectation is bounded. Now given that a vertex has property B* , it is likely that its 
relevant neighbourhood (up to t u / £ ) is a tree. (The expected number of edges within 

sets r t (x) is bounded by 5 = rrH w ^ = 0{uj z e~ z n~ l ) = 0(A~ 1//2 ) = o(l).) We 
had to consider the non-tree case, because 5 may go to zero only slowly, but after re- 
ducing to vertices satisyfing B* , it is easy to check from the proof of Lemma [37] that the 
probability that A* n B* holds and the neighbourhood is not a tree is o(F(A* n B*)). 
It follows that (if M increases slowly enough), the expected number of vertices with 
A* n B* holding and the neighbourhood not a tree is o(l). 

When considering tree neighbourhoods, acceptability becomes a much simpler con- 
dition, closely related to the weak wedge condition. So far we considered any vertex x 
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in a pair (x, y) at maximal distance. Since we are only interested in the existence of a 
pair at a certain distance, we may restrict our attention to those x that are first in their 
tree T x in our auxilliary random order. For vertices satisfying A* B* , the conditional 
probability of this extra condition is asymptotically M(l/R), as above. Putting the 
pieces together, we find that whp the diameter is realized by some pair of vertices each 
of which satisfies a certain condition F' q depending on its t = to + t\ + q neighbourhood, 
where again \qe\ < M. This condition is the event that G<t(x) is a tree, and the event 
At H Bt considered in Lemma [37l modified to the medium wedge holds. Also, modifying 
the proof of this lemma as indicated above, the probability that a vertex satsifies this 
condition is 

P(i^(x)) = E(l/i?)(l + o(l))4 7o e 3 Ai-* 1 ~ 4 7l A^n^ 1 

Now the precise details of F q (x) and F'(x) are rather different. However, the definitions 
are such that F q (x) implies F'Jx). (Firstly, in defining F q (x) we insisted that G<t{x) is 
a tree. Secondly, via the condition D = D\ Pi D[ fl D%, we ensured that |r t /(a;)| < uj/e 
for < t' < t. Thirdly, via A we ensured that for all t' up to t^ — r > to — 2M/e, which is 
much larger than h, there is a unique particle in each generation t' with descendants in 
r^(x). Finally, we imposed the (there strong, but now medium) wedge condition on all 
the side branches starting up to time (at least) to — 2M/e. This implies the (modified) 
form of (h, ^-acceptability in F q .) 

Since F(F q (x)) ~ W(F q (x)), and the expected number of vertices with F q (x) is (for 
M fixed) 0(1), it follows that for each q, whp every vertex with property F'Jx) also 
has F q {x). We shall essentially consider only a bounded number of values of q (again, 
a number that tends to infinity arbitrarily slowly), so this holds whp for all such val- 
ues. Thus, whp, the diameter is equal to the maximum distance between vertices with 
property F q {x) for suitable q. This also applies if M — ► oo slowly enough. We may thus 
forget about F q (x). 

Now the condition F q {x) says that the (medium) wedge condition holds, that t(x) = 
tui/e( x ) > to + ti + q, and that certain other technical conditions hold. We shall need 
to know a little more, namely roughly how large t(x) is. From the remarks above, we 
may ignore x with t(x) > t + 1\ + M/e. For —M 2 < i < M 2 , let g» = i/(Me). Let us 
say that x is of type i if F qi (x) \ F qi+1 (x) holds; this corresponds roughly to the wedge 
condition plus qi < t(x) —to — ti < qi+\. Let iVj be the number of type i vertices. With 
M constant, applying (|114|) twice shows that EiVj is asymptotically what it should be, 
and as usual this extends to M — > oo slowly enough, in which case 

KNi ~ 47ie _9l 7M, 

since A^ = (1 — e + 0{e 2 )) q ~ e~ qe if qe does not grow too fast. 

Let us say that x is plausible if it is of type i for some —M 2 < i < M 2 . From the 
comments above, whp the diameter is realized by a pair of plausible vertices. 

Now, the precise technical conditions in the definition of type i vertices are as in 
Subsection 14.6} as there, these allow us to calculate 2nd moments, and indeed rth 
moments for any fixed r. More precisely, given a sequence i = . . . , i r ), let us say 
that a sequence (x%, . . . ,x r ) of distinct vertices is an r -tuple of type i if each Xj is of 
type ij. Such an r-tuple is good if the relevant trees witnessing this are disjoint, and 
bad otherwise. Arguing as in Subsection 14.61 the expected number of good r-tuples is 
what it should be, namely (1 + o(l)) TJ[ =1 EiVj (which is 0(1) if M is fixed) and the 
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expected number of bad r-tuples is o(l). This shows that all fixed mixed moments of 
the sequence (N_ M 2, . . . , Nm 2 ) converge to what we expect, and thus that (for M fixed) 
the sequence (iVj) coverges in distribution to a sequence of independent Poisson random 
variables. 

Turning to the diameter, let P be the number of unordered pairs (x, y) of plausible 
vertices with d(x,y) > d = do + ce _1 , where c is constant. We aim to understand 
P(P > 0) by evaluating the factorial moments E k (P) = E(P(P-1) . . . (P-k + 1)). Now 
Efc(P) is the expected number of fc-tuples of distinct pairs with the relevant property. 
It may be that several pairs involve the same vertex; in general we can write Efc(P) as a 
sum over integers r <2k and graphs H on {1, 2, . . . , r} with k edges of the expectation 
of the number of r-tuples of plausible vertices in which certain specified pairs are at 
distance at least d and the others are not. We evaluate this by summing over the 
types of the relevant vertices. Thus we must evaluate the expected number of r-tuples 
(x\, . . . , x r ) of type i in which k specified pairs are at distance at least d and the others 
are not. 

Since there are o(l) bad r-tuples, we consider only good r-tuples. Finally, we test 
whether a particular sequence (x±, . . . ,x r ) has the required property by exploring the 
neighbourhoods of each Xj out to the relevant distance (to + +%)■ By Lemma[[8l the 
probability that the explorations are disjoint and each Xj is of the right type is 'what it 
should be', namely n~ r times the expected number of good r-tuples of type i. Suppose 
this happens. Then we have not so far tested any edges outside these neighbourhoods. 

Continuing to explore, the neighbourhoods grow at the expected rate whp. We 
explore £2/2— 0(l/e) further steps, by which time the neighbourhoods have size 0(y / en). 
(Recall that this is the size at which they typically meet.) By this time, there are very 
few (in expectation O(l)) vertices in two or more neighbourhoods, and whp none in 
three or more. It follows that the times at which different pairs of neighbourhoods meet 
are essentially independent, with distribution given by Lemma [15j This allows us to 
calculate E fc (P), and hence P(diam(G(n, A/n)) > d) ~ P(P > 0). 

Rather than give any further details, let us describe the limiting distribution we 
obtain. It should then be clear that all expectations being 'what they should be' corre- 
sponds to convergence to the corresponding values for this limiting distribution. 

Let V be a Poisson process on M with density function f(x) = A^\e~ x . Note that 
f x / >x f(x') = f(x) < 00 for any x, so with probability 1 we may list the points of V 
as z\,Z2,... in decreasing order. For each 1 < i < j, let be a random variable 
with P(Ty > x) = exp(— e x ), with these variables independent of each other and of V . 
Finally, let D = supjzj + Zj + Tij}. It is not hard to check that with probability 1 D 
is finite, and the supremum is attained. Indeed, as M — > 00, the probability that it is 
attained by some i, j with i,j > — M tends to 1. 

Theorem 40. Let e = e(n) > satisfy e — > and e s n — > 00. and let A = 1 + e. For 
any constant c we have 

P ( diam(G(n, A/n)) > + + c/e ) _> P{D > c) 

as n — » 00. □ 

In other words, the O p (l/e) correction term in ([5]) converges in distribution to D 
(after multiplication by e). 
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We have proved Theorem 00] in outline above. There are a few further technical 
details (such as checking that the relevant sequences of moments do not grow too fast, 
so convergence of all fixed moments gives convergence in distribution), but we shall not 
describe these any further. 

The description of the random variable D is somewhat complicated; it seems rather 
unlikely that this random variable will have a simpler description. Given this descrip- 
tion, the branching process approach taken here seems with hindsight very natural: the 
description of D more or less forces us to consider the (exponentially distributed) times 
that the vertices take for their neighbourhoods to reach certain very large sizes, and 
then the time they take to meet after this. 

Finally, let us comment very briefly on the case p = A/n, A constant. It is not that 
the proof is any harder in this case (it is much easier), but the result is much harder 
to describe. Again we consider vertices satisfying the medium wedge condition (which 
now has probability bounded away from 0), and study the distribution of t UJ {x) for such 
x, where cu = (logn) 6 , say, in the range where P(t a; (2;) > to) is of order 1/n. From 
Lemma [Hit is very easy to check that when t^x) is very large, this is almost always 
because for many generations there is only one neighbour whose descendants do not die 
quickly, and we easily find asymptotic independence of the event {t w {x) > to} and the 
wedge condition. 

Approximating by a branching process, it is easy to prove an equivalent of Theo- 
rem !29l showing that the distribution of t u {x) may be described (as in the A — > 1 case) by 
the tail of Y = Y\ near 0. But now the first complication appears: this random variable 
no longer has a nice power-law tail, but asymptotically follows a power law multiplied by 
a function that oscillates periodicially within a constant factor. Also, when we explore 
neighbourhoods and reach size u, the current neighbourhood may have any size between 
lo and \lo; this constant factor affects the probability of joining up with another neigh- 
bourhood within a certain time. In the end it turns out that the distribution depends 
on the fractional parts of both logn/ log A and log n/ log A*, as indeed it must from the 
form of d3]). We omit the details, as a precise statement of the result would be rather 
lengthy. 
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